Context¶
As a Data Scientist at ABC Estate Wines, you have been assigned to analyze and forecast the sales of different types of wine in the 20th century. The dataset comprises sales data from the same company but for different wines.
The objective is to gain insights into historical sales patterns and use this information to make accurate forecasts for future wine sales.
By leveraging analytical techniques and forecasting models, you should provide valuable insights and predictions that can help ABC Estate Wines make informed decisions and effectively manage their wine sales in the coming years.
Importing necessary libraries¶
from statsmodels.tsa.arima.model import ARIMA as ar
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.linear_model import LinearRegression
import statsmodels as st
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
from statsmodels.tsa.arima.model import ARIMA
import itertools
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import numpy as np
import pandas as pd
from sklearn import metrics
import matplotlib.pyplot as plt
#import plotly.offline as py
%matplotlib inline
import seaborn as sns
from pylab import rcParams
from sklearn.metrics import root_mean_squared_error # to resolve TypeError: got an unexpected keyword argument 'squared'
Loading the dataset¶
df = pd.read_csv('/content/Sparkling.csv')
Data Overview¶
Displaying the first few rows of the dataset¶
df.head()
| YearMonth | Sparkling | |
|---|---|---|
| 0 | 1980-01 | 1686 |
| 1 | 1980-02 | 1591 |
| 2 | 1980-03 | 2304 |
| 3 | 1980-04 | 1712 |
| 4 | 1980-05 | 1471 |
df.tail()
| YearMonth | Sparkling | |
|---|---|---|
| 182 | 1995-03 | 1897 |
| 183 | 1995-04 | 1862 |
| 184 | 1995-05 | 1670 |
| 185 | 1995-06 | 1688 |
| 186 | 1995-07 | 2031 |
Checking the shape of the dataset¶
print(f"There are {df.shape[0]} rows and {df.shape[1]} columns.")
There are 187 rows and 2 columns.
Checking the data types of the columns for the dataset¶
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 187 entries, 0 to 186 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 YearMonth 187 non-null object 1 Sparkling 187 non-null int64 dtypes: int64(1), object(1) memory usage: 3.1+ KB
Observations:
- YearMonth is object, Sparkling is numerical
Data Pre processing¶
# Create a sequence of monthly timestamps to match the number of rows in the dataframe
Time_Stamp = pd.date_range(
start='1980-01-01', # The starting date for the time series
periods=len(df), # Number of timestamps equals number of rows in df
freq='M' # Frequency 'M' means end of every month
)
# Display the generated timestamps
Time_Stamp
DatetimeIndex(['1980-01-31', '1980-02-29', '1980-03-31', '1980-04-30',
'1980-05-31', '1980-06-30', '1980-07-31', '1980-08-31',
'1980-09-30', '1980-10-31',
...
'1994-10-31', '1994-11-30', '1994-12-31', '1995-01-31',
'1995-02-28', '1995-03-31', '1995-04-30', '1995-05-31',
'1995-06-30', '1995-07-31'],
dtype='datetime64[ns]', length=187, freq='ME')
# Add the generated Time_Stamp as a new column in the dataframe
df['Time_Stamp'] = Time_Stamp
# Display the first 5 rows of the dataframe to verify the new column
df.head()
| YearMonth | Sparkling | Time_Stamp | |
|---|---|---|---|
| 0 | 1980-01 | 1686 | 1980-01-31 |
| 1 | 1980-02 | 1591 | 1980-02-29 |
| 2 | 1980-03 | 2304 | 1980-03-31 |
| 3 | 1980-04 | 1712 | 1980-04-30 |
| 4 | 1980-05 | 1471 | 1980-05-31 |
# Set the 'Time_Stamp' column as the dataframe index
# 'inplace=True' means the change will directly modify the existing dataframe (no need to reassign)
df.set_index(keys='Time_Stamp', inplace=True)
# Display the dataframe to confirm that Time_Stamp is now the index
df
| YearMonth | Sparkling | |
|---|---|---|
| Time_Stamp | ||
| 1980-01-31 | 1980-01 | 1686 |
| 1980-02-29 | 1980-02 | 1591 |
| 1980-03-31 | 1980-03 | 2304 |
| 1980-04-30 | 1980-04 | 1712 |
| 1980-05-31 | 1980-05 | 1471 |
| ... | ... | ... |
| 1995-03-31 | 1995-03 | 1897 |
| 1995-04-30 | 1995-04 | 1862 |
| 1995-05-31 | 1995-05 | 1670 |
| 1995-06-30 | 1995-06 | 1688 |
| 1995-07-31 | 1995-07 | 2031 |
187 rows × 2 columns
# axis=1 → refers to dropping a column
# inplace=True → applies the change directly to df
df.drop(['YearMonth'], axis=1, inplace=True) # Drop the 'YearMonth' column from the dataframe
print(df.shape)
(187, 1)
# Confirming the if it YearMonth was successfully dropped
df.head()
| Sparkling | |
|---|---|
| Time_Stamp | |
| 1980-01-31 | 1686 |
| 1980-02-29 | 1591 |
| 1980-03-31 | 2304 |
| 1980-04-30 | 1712 |
| 1980-05-31 | 1471 |
Checking for missing values¶
# Check the dataframe for missing values
# This returns True for each cell that contains a missing value, otherwise False
df.isnull()
| Sparkling | |
|---|---|
| Time_Stamp | |
| 1980-01-31 | False |
| 1980-02-29 | False |
| 1980-03-31 | False |
| 1980-04-30 | False |
| 1980-05-31 | False |
| ... | ... |
| 1995-03-31 | False |
| 1995-04-30 | False |
| 1995-05-31 | False |
| 1995-06-30 | False |
| 1995-07-31 | False |
187 rows × 1 columns
# Making sure that there are no missing values by checking the missing count. Count total of missing values per column
df.isnull().sum()
| 0 | |
|---|---|
| Sparkling | 0 |
df.head()
| Sparkling | |
|---|---|
| Time_Stamp | |
| 1980-01-31 | 1686 |
| 1980-02-29 | 1591 |
| 1980-03-31 | 2304 |
| 1980-04-30 | 1712 |
| 1980-05-31 | 1471 |
Exploratory Data Analysis (EDA)¶
Univariate Analysis¶
plt.figure(figsize=(12,6))
# Set the figure size for better readability
plt.hist(df['Sparkling'], bins=20, color='skyblue', edgecolor='black')
# Plot the distribution of Sparkling Wine sales
# bins=20 → number of bars (distribution granularity)
# skyblue bars with black edges for contrast
plt.title("Distribution of Sparkling Wine Sales", fontsize=16)
# Title describing the chart
plt.xlabel("Number of Sales", fontsize=12)
# X-axis shows numeric sales values (univariate focus)
plt.ylabel("Frequency", fontsize=12)
# Y-axis shows how often each sales value occurred
plt.grid(axis='y', linestyle='--', alpha=0.5)
# Light horizontal grid to improve readability
# Applied only to Y-axis for a clean look
plt.tight_layout()
# Adjust layout to prevent label overlap
plt.show()
# Display the final plot
plt.figure(figsize=(8,5))
sns.boxplot(y=df['Sparkling'], color='lightcoral')
plt.title("Boxplot of Sparkling Wine Sales", fontsize=16)
plt.ylabel("Number of Sales", fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
Observations¶
The histogram shows that most of the sales of Sparkling Wine occur between 1,400 and 2,200 units. Fewer months have much higher sales, reflecting a right-skewed distribution with some periods of high demand.
The boxplot shows that generally, the sales of Sparkling Wine range between 1,500 and 2,400 units with a median of around 1,900. The high value outliers are quite numerous, indicating periodic strong seasonal peaks in demand.
Bivariate Analysis¶
TimeStamp vs Sparking Sales¶
plt.figure(figsize=(12, 6)) # Set the figure size for better visibility
plt.plot(df, linewidth=2) # Plot the dataframe values with a thicker line
plt.title('Sparkling Sales Over Time', fontsize=14) # Add a title to the plot
plt.xlabel('Time', fontsize=12) # Label the x-axis as Time
plt.ylabel('Sales', fontsize=12) # Label the y-axis as Sales
plt.grid(True) # Add a grid to make the chart easier to read
plt.show() # Display the plot
# Group data by year and calculate yearly total/mean sales
yearly_sales = df['Sparkling'].groupby(df.index.year).sum()
plt.figure(figsize=(12, 6)) # Bigger chart for clear visibility
# Choose rainbow colormap and plot
colors = plt.cm.rainbow(range(len(yearly_sales))) # Rainbow gradient colors
plt.bar(yearly_sales.index, yearly_sales.values, color=colors)
# Labels and title
plt.title('Yearly Sparkling Wine Sales Over Years', fontsize=14)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Sales', fontsize=12)
# Display values on bars for better insight
for i, v in enumerate(yearly_sales.values):
plt.text(yearly_sales.index[i], v, str(round(v, 2)),
ha='center', va='bottom', fontsize=9)
plt.grid(axis='y', linestyle='--', alpha=0.6) # Subtle grid for readability
plt.show()
Sparkling Sales by Year¶
# x = df.index.year extracts the year from the DateTime index
# y = df['Sparkling'] selects the Sparkling sales column to plot
sns.boxplot(x=df.index.year, y=df['Sparkling'])
# Add labels and title for better readability
plt.xlabel('Year', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)
plt.title('Yearly Distribution of Sparkling Sales', fontsize=14)
# Rotate x-axis labels and optionally reduce number of ticks
plt.xticks(rotation=45) # Rotate year labels so they don’t overlap
plt.grid(True) # Add grid to make the visualization cleaner
# Show the plot
plt.show()
Sparkling Sales by Month¶
# x = df.index.month_name() extracts month names from the DateTime index
# y = df['Sparkling'] selects the Sparkling sales column to plot
sns.boxplot(x=df.index.month_name(), y=df['Sparkling'])
# Add labels and title for better understanding
plt.xlabel('Month', fontsize=12) # Label for x-axis
plt.ylabel('Sparkling Sales', fontsize=12) # Label for y-axis
plt.title('Monthly Distribution of Sparkling Sales', fontsize=14) # Plot title
# Rotate x-axis labels to avoid clutter since there are 12 long month names
plt.xticks(rotation=45, ha='right') # ha='right' aligns labels nicely
plt.grid(True) # Add grid for improved readability
# Display the plot
plt.show()
Monthly Sales Across Years¶
monthly_sales_across_years = pd.pivot_table(df, values = 'Sparkling', columns = df.index.month_name(), index = df.index.year)
monthly_sales_across_years
| Time_Stamp | April | August | December | February | January | July | June | March | May | November | October | September |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Time_Stamp | ||||||||||||
| 1980 | 1712.0 | 2453.0 | 5179.0 | 1591.0 | 1686.0 | 1966.0 | 1377.0 | 2304.0 | 1471.0 | 4087.0 | 2596.0 | 1984.0 |
| 1981 | 1976.0 | 2472.0 | 4551.0 | 1523.0 | 1530.0 | 1781.0 | 1480.0 | 1633.0 | 1170.0 | 3857.0 | 2273.0 | 1981.0 |
| 1982 | 1790.0 | 1897.0 | 4524.0 | 1329.0 | 1510.0 | 1954.0 | 1449.0 | 1518.0 | 1537.0 | 3593.0 | 2514.0 | 1706.0 |
| 1983 | 1375.0 | 2298.0 | 4923.0 | 1638.0 | 1609.0 | 1600.0 | 1245.0 | 2030.0 | 1320.0 | 3440.0 | 2511.0 | 2191.0 |
| 1984 | 1789.0 | 3159.0 | 5274.0 | 1435.0 | 1609.0 | 1597.0 | 1404.0 | 2061.0 | 1567.0 | 4273.0 | 2504.0 | 1759.0 |
| 1985 | 1589.0 | 2512.0 | 5434.0 | 1682.0 | 1771.0 | 1645.0 | 1379.0 | 1846.0 | 1896.0 | 4388.0 | 3727.0 | 1771.0 |
| 1986 | 1605.0 | 3318.0 | 5891.0 | 1523.0 | 1606.0 | 2584.0 | 1403.0 | 1577.0 | 1765.0 | 3987.0 | 2349.0 | 1562.0 |
| 1987 | 1935.0 | 1930.0 | 7242.0 | 1442.0 | 1389.0 | 1847.0 | 1250.0 | 1548.0 | 1518.0 | 4405.0 | 3114.0 | 2638.0 |
| 1988 | 2336.0 | 1645.0 | 6757.0 | 1779.0 | 1853.0 | 2230.0 | 1661.0 | 2108.0 | 1728.0 | 4988.0 | 3740.0 | 2421.0 |
| 1989 | 1650.0 | 1968.0 | 6694.0 | 1394.0 | 1757.0 | 1971.0 | 1406.0 | 1982.0 | 1654.0 | 4514.0 | 3845.0 | 2608.0 |
| 1990 | 1628.0 | 1605.0 | 6047.0 | 1321.0 | 1720.0 | 1899.0 | 1457.0 | 1859.0 | 1615.0 | 4286.0 | 3116.0 | 2424.0 |
| 1991 | 1279.0 | 1857.0 | 6153.0 | 2049.0 | 1902.0 | 2214.0 | 1540.0 | 1874.0 | 1432.0 | 3627.0 | 3252.0 | 2408.0 |
| 1992 | 1997.0 | 1773.0 | 6119.0 | 1667.0 | 1577.0 | 2076.0 | 1625.0 | 1993.0 | 1783.0 | 4096.0 | 3088.0 | 2377.0 |
| 1993 | 2121.0 | 2795.0 | 6410.0 | 1564.0 | 1494.0 | 2048.0 | 1515.0 | 1898.0 | 1831.0 | 4227.0 | 3339.0 | 1749.0 |
| 1994 | 1725.0 | 1495.0 | 5999.0 | 1968.0 | 1197.0 | 2031.0 | 1693.0 | 1720.0 | 1674.0 | 3729.0 | 3385.0 | 2968.0 |
| 1995 | 1862.0 | NaN | NaN | 1402.0 | 1070.0 | 2031.0 | 1688.0 | 1897.0 | 1670.0 | NaN | NaN | NaN |
monthly_sales_across_years.plot(figsize=(20,10))
plt.grid()
plt.legend(loc='best');
Observations¶
In the sales of Sparkling Wine, a clear yearly seasonal pattern can be seen, with sharp peaks and an upward trend across the years.
Sales generally rose over the years, peaking in about 1988 with the highest total, which confirms long-term demand growth.
Most years show stable sales levels, while several years contain high outliers, indicating seasonal surges in demand such as holiday periods.
Sales are always highest in December, confirming strong seasonality, and lowest between May and August.
December exhibits the most powerful growth throughout the years, while the other months are quite stable with gradual increases, reinforcing a strong seasonal effect at year-end.
Decomposition¶
Additive¶
# Additive Decomposition
# This splits the time series into Trend, Seasonal, and Residual components
decomposition = seasonal_decompose(df, model='additive')
# Plot the decomposed components (Observed, Trend, Seasonal, Residual)
plt.figure(figsize=(12, 8)) # Increase figure size for better readability
decomposition.plot()
plt.suptitle('Additive Seasonal Decomposition of Sparkling Sales', fontsize=16) # Add a title to the entire figure
plt.tight_layout() # Adjust layout to avoid overlapping elements
plt.show() # Display the full decomposition plot
<Figure size 1200x800 with 0 Axes>
# Extract decomposition components
trend = decomposition.trend # Long-term pattern
seasonality = decomposition.seasonal # Recurring seasonal effects
residual = decomposition.resid # Noise/irregular component
# Display a preview of the first 12 values from each component
print("🔹 Trend Component (first 12 values):")
print(trend.head(12))
print("\n" + "-"*50 + "\n") # Add separator for readability
print("🔹 Seasonal Component (first 12 values):")
print(seasonality.head(12))
print("\n" + "-"*50 + "\n")
print("🔹 Residual Component (first 12 values):")
print(residual.head(12))
print("\n" + "-"*50)
# Check for missing values in each component (important before modeling)
print("\n✅ Missing Values Count:")
print("Trend:", trend.isna().sum())
print("Seasonality:", seasonality.isna().sum())
print("Residual:", residual.isna().sum())
🔹 Trend Component (first 12 values): Time_Stamp 1980-01-31 NaN 1980-02-29 NaN 1980-03-31 NaN 1980-04-30 NaN 1980-05-31 NaN 1980-06-30 NaN 1980-07-31 2360.666667 1980-08-31 2351.333333 1980-09-30 2320.541667 1980-10-31 2303.583333 1980-11-30 2302.041667 1980-12-31 2293.791667 Name: trend, dtype: float64 -------------------------------------------------- 🔹 Seasonal Component (first 12 values): Time_Stamp 1980-01-31 -854.260599 1980-02-29 -830.350678 1980-03-31 -592.356630 1980-04-30 -658.490559 1980-05-31 -824.416154 1980-06-30 -967.434011 1980-07-31 -465.502265 1980-08-31 -214.332821 1980-09-30 -254.677265 1980-10-31 599.769957 1980-11-30 1675.067179 1980-12-31 3386.983846 Name: seasonal, dtype: float64 -------------------------------------------------- 🔹 Residual Component (first 12 values): Time_Stamp 1980-01-31 NaN 1980-02-29 NaN 1980-03-31 NaN 1980-04-30 NaN 1980-05-31 NaN 1980-06-30 NaN 1980-07-31 70.835599 1980-08-31 315.999487 1980-09-30 -81.864401 1980-10-31 -307.353290 1980-11-30 109.891154 1980-12-31 -501.775513 Name: resid, dtype: float64 -------------------------------------------------- ✅ Missing Values Count: Trend: 12 Seasonality: 0 Residual: 12
Checking for additive assumptions¶
Residual Component ( Additive )¶
# Calculate the mean of the residual component
# If the model decomposition is correct, the residual mean should be close to zero
residual_mean = residual.mean()
# Print the residual mean value
print("Average of Residual Component:", residual_mean)
Average of Residual Component: -1.2088458994707376
# Normality distribution check of the residual component
from scipy.stats import shapiro
# Plot histogram + KDE curve to visually assess normality
plt.figure(figsize=(10,5)) # Bigger for clearer view
sns.histplot(residual.dropna(), kde=True) # dropna() to remove missing values
# Add title and axis labels for better readability
plt.title('Additive Residuals Distribution (Normality Check)')
plt.xlabel('Residual Values')
plt.ylabel('Frequency')
plt.grid(True) # Add grid for clarity
plt.show()
Shapiro-Wilk test for Normality on the Residuals ( Additive )¶
# dropna() removes any missing values that can cause errors in the test
stat, p_value = shapiro(residual.dropna())
# Print the statistical results
print("Shapiro-Wilk Test Statistic:", stat)
print("p-value:", p_value)
# Interpretation of the p-value
# If p-value > 0.05 → residuals are likely normally distributed (good for forecasting models)
# If p-value <= 0.05 → residuals deviate from normality (model assumptions may need revisiting)
if p_value > 0.05:
print("✅ Residuals appear to follow a normal distribution.")
else:
print("⚠ Residuals are NOT normally distributed.")
Shapiro-Wilk Test Statistic: 0.9833056372432126 p-value: 0.03433683203800602 ⚠ Residuals are NOT normally distributed.
Multiplicative¶
# Multiplicative Decomposition
# Use this when seasonal fluctuations grow/shrink with the level of the series
decomposition = seasonal_decompose(df, model='multiplicative') # complete the code to multiplicative decomposition
# Plot the observed, trend, seasonal, and residual components
decomposition.plot()
plt.suptitle('Multiplicative Seasonal Decomposition of Sparkling Sales', fontsize=14)
plt.tight_layout()
plt.show()
# Extract components from the multiplicative decomposition
trend = decomposition.trend # Long-term movement in the data
seasonality = decomposition.seasonal # Seasonal repeating pattern over time
residual = decomposition.resid # Noise/irregular component after removing trend and seasonality
# Display the first 12 entries for each component for inspection
print("🔹 Trend Component (first 12 values):")
print(trend.head(12)) # Check how the trend evolves over early months
print("\n" + "-"*50 + "\n") # Visual separator
print("🔹 Seasonal Component (first 12 values):")
print(seasonality.head(12)) # Helps understand monthly seasonal effect
print("\n" + "-"*50 + "\n")
print("🔹 Residual Component (first 12 values):")
print(residual.head(12)) # Should ideally look like random noise
print("\n" + "-"*50)
# Check for missing values in each component (common at the boundaries)
print("\n✅ Missing Values Count:")
print("Trend:", trend.isna().sum())
print("Seasonality:", seasonality.isna().sum())
print("Residual:", residual.isna().sum())
🔹 Trend Component (first 12 values): Time_Stamp 1980-01-31 NaN 1980-02-29 NaN 1980-03-31 NaN 1980-04-30 NaN 1980-05-31 NaN 1980-06-30 NaN 1980-07-31 2360.666667 1980-08-31 2351.333333 1980-09-30 2320.541667 1980-10-31 2303.583333 1980-11-30 2302.041667 1980-12-31 2293.791667 Name: trend, dtype: float64 -------------------------------------------------- 🔹 Seasonal Component (first 12 values): Time_Stamp 1980-01-31 0.649843 1980-02-29 0.659214 1980-03-31 0.757440 1980-04-30 0.730351 1980-05-31 0.660609 1980-06-30 0.603468 1980-07-31 0.809164 1980-08-31 0.918822 1980-09-30 0.894367 1980-10-31 1.241789 1980-11-30 1.690158 1980-12-31 2.384776 Name: seasonal, dtype: float64 -------------------------------------------------- 🔹 Residual Component (first 12 values): Time_Stamp 1980-01-31 NaN 1980-02-29 NaN 1980-03-31 NaN 1980-04-30 NaN 1980-05-31 NaN 1980-06-30 NaN 1980-07-31 1.029230 1980-08-31 1.135407 1980-09-30 0.955954 1980-10-31 0.907513 1980-11-30 1.050423 1980-12-31 0.946770 Name: resid, dtype: float64 -------------------------------------------------- ✅ Missing Values Count: Trend: 12 Seasonality: 0 Residual: 12
Checking for Multiplicative assumptions¶
Residual Component ( Multiplicative )¶
# Extract the residual (error) component from the decomposition
residual = decomposition.resid
# Calculate the mean value of the residuals
# If decomposition separated trend and seasonality properly, this mean should be very close to zero
residual_mean = residual.mean()
# Print the mean of the residuals
print("Mean of Residual Component:", residual_mean)
Mean of Residual Component: 0.9997456359115032
# Plot the distribution of the residual values to visually check if they follow a normal distribution
plt.figure(figsize=(10, 5)) # Increase figure size to improve readability
sns.histplot(residual.dropna(), kde=True)
# dropna() removes missing values
# kde=True overlays a smooth probability curve to visualize the shape of distribution
# Add plot title and axis labels for explanation
plt.title('Multiplicative Residuals Distribution (Normality Check)', fontsize=14)
plt.xlabel('Residual Values', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.grid(True) # Add a grid for easier interpretation
plt.show() # Display the plot
Shapiro-Wilk test for Normality on the Residuals ( Multiplicative )¶
# Perform the Shapiro-Wilk test to check if residuals follow a normal distribution under Multiplicative
# dropna() removes missing values that can cause errors in the test
stat, p_value = shapiro(residual.dropna())
# Display the test results
print("Shapiro-Wilk Test Statistic:", stat)
print("p-value:", p_value)
# Interpretation of the p-value result
# If p-value > 0.05 → residuals are likely normally distributed (good for forecasting models)
# If p-value <= 0.05 → residuals deviate from normality (assumptions may need review)
if p_value > 0.05:
print("✅ Residuals appear to follow a normal distribution.")
else:
print("⚠ Residuals are NOT normally distributed.")
Shapiro-Wilk Test Statistic: 0.9859993833004654 p-value: 0.07803310394786477 ✅ Residuals appear to follow a normal distribution.
df.describe()
| Sparkling | |
|---|---|
| count | 187.000000 |
| mean | 2402.417112 |
| std | 1295.111540 |
| min | 1070.000000 |
| 25% | 1605.000000 |
| 50% | 1874.000000 |
| 75% | 2549.000000 |
| max | 7242.000000 |
Observations¶
Additive
There is a clear seasonal pattern every year, with an increasing trend over time and fluctuations captured well in the seasonal component.
Residuals show a rather centered distribution but with evident skewness and heavy tails.
p < 0.05, The residuals are not normally distributed therefore, the additive model does not adequately describe variability.
Multiplicative
Peaks in seasons expand proportionally with the trend, reflecting that the seasonal effects rise as the overall sales increase for which multiplicative seasonality is more suitable.
Residuals are more symmetric and closer to a bell-shaped distribution than those from the additive model.
p-value > 0.05, Residuals are approximately normally distributed, meaning the multiplicative model fits better.
Conclusion:
- Based on the tests for decomposition and residual normality, the multiplicative model fits better for Sparkling Wine sales since seasonal variations grow with the increase in the overall level of sales.
Data Preparation for Modeling¶
# Year
df.index.year.unique()
Index([1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991,
1992, 1993, 1994, 1995],
dtype='int32', name='Time_Stamp')
# Month
df.index.month.unique()
Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype='int32', name='Time_Stamp')
Train / Test Split¶
df_train = df[df.index <= "1991-12-31"] # Training data up to end of 1991
df_test = df[df.index > "1991-12-31"] # Testing data from 1992 onward
# Print the shape (rows, columns) of the training set
# This helps verify how much data was assigned to the training portion
print("Training Set Shape:", df_train.shape)
# Print the shape (rows, columns) of the testing set
# This ensures the test period contains data after 1991
print("Testing Set Shape:", df_test.shape)
Training Set Shape: (144, 1) Testing Set Shape: (43, 1)
Train Dataset after Splitting¶
df_train.head()
| Sparkling | |
|---|---|
| Time_Stamp | |
| 1980-01-31 | 1686 |
| 1980-02-29 | 1591 |
| 1980-03-31 | 2304 |
| 1980-04-30 | 1712 |
| 1980-05-31 | 1471 |
Test Dataset after Splitting¶
df_test.head()
| Sparkling | |
|---|---|
| Time_Stamp | |
| 1992-01-31 | 1577 |
| 1992-02-29 | 1667 |
| 1992-03-31 | 1993 |
| 1992-04-30 | 1997 |
| 1992-05-31 | 1783 |
Model Building¶
Linear Regression Model¶
# Create a sequence of time instances for the training dataset
# Adding +1 so the sequence starts from 1
train_time = [i + 1 for i in range(len(df_train))]
# Create a sequence of time instances for the test dataset
# Adding +133 so the test time follows the training period sequentially
# (133 should match the length of df_train to avoid time gaps)
test_time = [i + len(df_train) + 1 for i in range(len(df_test))]
# Display the generated time index ranges
print("Training Time Instances:", train_time[:10], "...") # Print first 10 values only for cleaner output
print("Total Training Time Points:", len(train_time))
print("\nTest Time Instances:", test_time[:10], "...") # Print first 10 values for preview
print("Total Test Time Points:", len(test_time))
Training Time Instances: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ... Total Training Time Points: 144 Test Time Instances: [145, 146, 147, 148, 149, 150, 151, 152, 153, 154] ... Total Test Time Points: 43
# Create copies of the train and test dataframes
# This ensures that any modifications we make for Linear Regression modeling
# (like adding features or transformations) do NOT affect the original data
LinearRegression_train = df_train.copy()
LinearRegression_test = df_test.copy()
# Confirm that separate datasets were successfully created
print("Training DataFrame for Linear Regression:", LinearRegression_train.shape)
print("Testing DataFrame for Linear Regression:", LinearRegression_test.shape)
Training DataFrame for Linear Regression: (144, 1) Testing DataFrame for Linear Regression: (43, 1)
# Add the numeric time index as a feature in the training and test dataframes
# This will be the independent variable for Linear Regression
LinearRegression_train['time'] = train_time
LinearRegression_test['time'] = test_time
# Display first few rows of training data to confirm the new 'time' column was added correctly
print("📌 First few rows of Training Data:")
print(LinearRegression_train.head())
print("\n" + "-"*60 + "\n") # Separator
# Display last few rows of training data to ensure correct alignment of time feature
print("📌 Last few rows of Training Data:")
print(LinearRegression_train.tail())
print("\n" + "-"*60 + "\n")
# Display first few rows of test data to verify correct continuation of time feature
print("📌 First few rows of Test Data:")
print(LinearRegression_test.head())
print("\n" + "-"*60 + "\n")
# Display last few rows of test data
print("📌 Last few rows of Test Data:")
print(LinearRegression_test.tail())
📌 First few rows of Training Data:
Sparkling time
Time_Stamp
1980-01-31 1686 1
1980-02-29 1591 2
1980-03-31 2304 3
1980-04-30 1712 4
1980-05-31 1471 5
------------------------------------------------------------
📌 Last few rows of Training Data:
Sparkling time
Time_Stamp
1991-08-31 1857 140
1991-09-30 2408 141
1991-10-31 3252 142
1991-11-30 3627 143
1991-12-31 6153 144
------------------------------------------------------------
📌 First few rows of Test Data:
Sparkling time
Time_Stamp
1992-01-31 1577 145
1992-02-29 1667 146
1992-03-31 1993 147
1992-04-30 1997 148
1992-05-31 1783 149
------------------------------------------------------------
📌 Last few rows of Test Data:
Sparkling time
Time_Stamp
1995-03-31 1897 183
1995-04-30 1862 184
1995-05-31 1670 185
1995-06-30 1688 186
1995-07-31 2031 187
# This will learn the linear relationship between Time_Stamp and Sparkling sales
lr = LinearRegression()
print("Linear Regression Model Initialized ✅")
Linear Regression Model Initialized ✅
LinearRegression_train['Sparkling'].values
array([1686, 1591, 2304, 1712, 1471, 1377, 1966, 2453, 1984, 2596, 4087,
5179, 1530, 1523, 1633, 1976, 1170, 1480, 1781, 2472, 1981, 2273,
3857, 4551, 1510, 1329, 1518, 1790, 1537, 1449, 1954, 1897, 1706,
2514, 3593, 4524, 1609, 1638, 2030, 1375, 1320, 1245, 1600, 2298,
2191, 2511, 3440, 4923, 1609, 1435, 2061, 1789, 1567, 1404, 1597,
3159, 1759, 2504, 4273, 5274, 1771, 1682, 1846, 1589, 1896, 1379,
1645, 2512, 1771, 3727, 4388, 5434, 1606, 1523, 1577, 1605, 1765,
1403, 2584, 3318, 1562, 2349, 3987, 5891, 1389, 1442, 1548, 1935,
1518, 1250, 1847, 1930, 2638, 3114, 4405, 7242, 1853, 1779, 2108,
2336, 1728, 1661, 2230, 1645, 2421, 3740, 4988, 6757, 1757, 1394,
1982, 1650, 1654, 1406, 1971, 1968, 2608, 3845, 4514, 6694, 1720,
1321, 1859, 1628, 1615, 1457, 1899, 1605, 2424, 3116, 4286, 6047,
1902, 2049, 1874, 1279, 1432, 1540, 2214, 1857, 2408, 3252, 3627,
6153])
# Train the Linear Regression model using the 'time' feature as the predictor
# 'time' (independent variable) → numerical representation of time progression
# 'Sparkling' (dependent variable) → target sales values we want to predict
lr.fit(LinearRegression_train[['time']], LinearRegression_train['Sparkling'].values)
# Display confirmation message
print("✅ Linear Regression model has been successfully trained.")
✅ Linear Regression model has been successfully trained.
# Use the trained model to make predictions on the test dataset
test_prediction_model = lr.predict(LinearRegression_test[['time']])
# Store the predictions in a new column for comparison
LinearRegression_test['RegOnTime'] = test_prediction_model
# Plot actual vs predicted values to visually evaluate performance
plt.figure(figsize=(14, 7)) # Larger figure for clear visibility
# Plot training data values
plt.plot(df_train['Sparkling'], label='Training Data', linewidth=2)
# Plot actual testing data values
plt.plot(df_test['Sparkling'], label='Actual Test Data', linewidth=2)
# Plot predicted values from Linear Regression model
plt.plot(LinearRegression_test['RegOnTime'],
label='Predicted Test Data (Linear Regression)',
linestyle='--', linewidth=2)
# Add title and labels
plt.title('Linear Regression Forecast vs Actual Sparkling Sales', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)
# Add legend and grid for better readability
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout() # Ensure no labels are cut off
plt.show() # Display the plot
# Calculate Root Mean Squared Error (RMSE) for model predictions on the test set
# squared=False ensures the metric returned is RMSE (not MSE)
rmse_model_test = metrics.mean_squared_error(
df_test['Sparkling'], # Actual values
test_prediction_model) # Predicted values
# squared=False # I'm getting an error here. TypeError: got an unexpected keyword argument 'squared'
# Display the calculated RMSE in a well-formatted print statement
print(f"✅ For Linear Regression (Time-Based) forecast on Test Data, RMSE ≈ {rmse_model_test:.3f}")
✅ For Linear Regression (Time-Based) forecast on Test Data, RMSE ≈ 1840430.137
# Create a DataFrame to store and display model performance results
# This allows easy comparison later when multiple models (like ARIMA, SARIMA, Prophet) are added
resultsDf = pd.DataFrame(
{'Test RMSE': [rmse_model_test]}, # Store RMSE value
index=['Regression On Time Model'] # Label row with model name
)
# Display the performance summary table
print("📊 Model Performance Summary:")
resultsDf
📊 Model Performance Summary:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
Observations:
The Linear Regression model does not capture the strong seasonal peaks in the sales of Sparkling Wine.
Its predictions form a nearly straight increasing line, resulting in a very high RMSE value.
This indicates that Linear Regression is not suitable for the modeling of this highly seasonal time series.
Naive Approach Model¶
# Create copies of the original train and test datasets
# These copies will be used to implement the Naive Forecasting model
NaiveModel_train = df_train.copy()
NaiveModel_test = df_test.copy()
print("✅ Copies created for Naive Model:")
print("Train set shape:", NaiveModel_train.shape)
print("Test set shape:", NaiveModel_test.shape)
✅ Copies created for Naive Model: Train set shape: (144, 1) Test set shape: (43, 1)
NaiveModel_train.shape
(144, 1)
# Confirm that the copies were successfully created
print("✅ Copies created for Naive Model:")
print("Training set shape:", NaiveModel_train.shape)
print("Testing set shape:", NaiveModel_test.shape)
✅ Copies created for Naive Model: Training set shape: (144, 1) Testing set shape: (43, 1)
# Display the first few rows of the Naive Model test dataset
# This helps verify that the copied dataset structure looks correct
print("📌 First few rows of Naive Test Data:")
NaiveModel_test.head()
📌 First few rows of Naive Test Data:
| Sparkling | |
|---|---|
| Time_Stamp | |
| 1992-01-31 | 1577 |
| 1992-02-29 | 1667 |
| 1992-03-31 | 1993 |
| 1992-04-30 | 1997 |
| 1992-05-31 | 1783 |
# Naive Forecast:
# The prediction for every value in the test set is simply
# the *last observed value* from the training period
# Extract the last observed training value
last_train_value = df_train['Sparkling'].iloc[-1]
# Assign the same value to the entire test set as the naive forecast
NaiveModel_test['naive'] = last_train_value
# Show the first few predictions to confirm
print("📌 Naive Forecast - First few predictions:")
NaiveModel_test['naive'].head()
📌 Naive Forecast - First few predictions:
| naive | |
|---|---|
| Time_Stamp | |
| 1992-01-31 | 6153 |
| 1992-02-29 | 6153 |
| 1992-03-31 | 6153 |
| 1992-04-30 | 6153 |
| 1992-05-31 | 6153 |
# Display the first few rows of the test dataset including Naive forecasts
print("📌 First few rows of NaiveModel_test (with naive predictions):")
NaiveModel_test.head()
📌 First few rows of NaiveModel_test (with naive predictions):
| Sparkling | naive | |
|---|---|---|
| Time_Stamp | ||
| 1992-01-31 | 1577 | 6153 |
| 1992-02-29 | 1667 | 6153 |
| 1992-03-31 | 1993 | 6153 |
| 1992-04-30 | 1997 | 6153 |
| 1992-05-31 | 1783 | 6153 |
# Visualizing the Naive Forecast model performance
plt.figure(figsize=(14, 7)) # Larger size for better visibility
# Plot training actual sales
plt.plot(NaiveModel_train['Sparkling'],
label='Training Data', linewidth=2)
# Plot actual test sales
plt.plot(df_test['Sparkling'],
label='Actual Test Data', linewidth=2)
# Plot naive predictions on test data
plt.plot(NaiveModel_test['naive'],
label='Naive Forecast (Test Data)', linestyle='--', linewidth=2)
# Adding title and axis labels for clarity
plt.title('Naive Forecast vs Actual Sparkling Sales', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sales', fontsize=12)
# Add legend in best location & grid for better readability
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
# Model Evaluation for Naive Forecast
# squared=False converts MSE → RMSE (Root Mean Squared Error)
rmse_model_test = metrics.mean_squared_error(
df_test['Sparkling'], # Actual test values
NaiveModel_test['naive']) # Naive forecast values
# False=True Keyword error
# Display the error score in a clean and formatted way
print(f"✅ Naive Forecast RMSE on Test Data: {rmse_model_test:.3f}")
✅ Naive Forecast RMSE on Test Data: 15839720.953
resultsDfN = pd.DataFrame({'Test RMSE': [rmse_model_test]}, index=['NaiveForecast'])
resultsDf = pd.concat([resultsDf, resultsDfN])
# Display the updated performance results
print("📊 Updated Model Performance Table:")
resultsDf
📊 Updated Model Performance Table:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
Simple Average Model¶
SimpleAverage_train = df_train.copy()
SimpleAverage_test = df_test.copy()
# Confirm the copies are successfully created
print("Training set shape:", SimpleAverage_train.shape)
print("Testing set shape:", SimpleAverage_test.shape)
Training set shape: (144, 1) Testing set shape: (43, 1)
# Simple Average Forecast Model
# Every forecasted value = average of the training data Sparkling Sales
# Compute the mean sales from the training dataset
SimpleAverage_test['mean forecast']= df_train['Sparkling'].mean()
# Display the first few rows to confirm the forecast column is added correctly
print("📌 First few rows of the Simple Average Test Data:")
SimpleAverage_test.head()
📌 First few rows of the Simple Average Test Data:
| Sparkling | mean forecast | |
|---|---|---|
| Time_Stamp | ||
| 1992-01-31 | 1577 | 2408.930556 |
| 1992-02-29 | 1667 | 2408.930556 |
| 1992-03-31 | 1993 | 2408.930556 |
| 1992-04-30 | 1997 | 2408.930556 |
| 1992-05-31 | 1783 | 2408.930556 |
# Plot the Simple Average Forecast model results
plt.figure(figsize=(14, 7)) # Bigger size for better visualization
# Plot actual training data
plt.plot(df_train['Sparkling'],
label='Training Data',
linewidth=2)
# Plot actual test data
plt.plot(df_test['Sparkling'],
label='Actual Test Data',
linewidth=2)
# Plot predicted values from Simple Average model
plt.plot(SimpleAverage_test['mean forecast'],
label='Simple Average Forecast (Test Data)',
linestyle='--', linewidth=2)
# Title and axis labels
plt.title("Simple Average Forecast vs Actual Sparkling Sales", fontsize=16)
plt.xlabel("Time", fontsize=12)
plt.ylabel("Sales", fontsize=12)
# Add legend and background grid for clarity
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout() # Avoid layout cut-offs
plt.show()
# Model Evaluation for Simple Average Forecast
# squared=False converts MSE → RMSE (Root Mean Squared Error)
rmse_model_test = root_mean_squared_error(
df_test['Sparkling'], # Actual test values
SimpleAverage_test['mean forecast'] # Forecasted values
)
# Print RMSE result in a clean formatted way
print(f"✅ RMSE for Simple Average Forecast on Test Data: {rmse_model_test:.3f}")
✅ RMSE for Simple Average Forecast on Test Data: 1268.683
# Create a performance entry for the Simple Average model
resultsDfSES = pd.DataFrame(
{'Test RMSE': [rmse_model_test]},
index=['SimpleAverageModel'] # ✅ Completed model name
)
# Append it to the main model comparison table
resultsDf = pd.concat([resultsDf, resultsDfSES])
# Display the updated results
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
| SimpleAverageModel | 1.268683e+03 |
Simple Exponential Smoothing Model¶
# Train and test dataset for Simple Exponential Smoothing (SES) model
SES_train = df_train.copy()
SES_test = df_test.copy()
print("✅ SES datasets created:")
print("Train shape:", SES_train.shape)
print("Test shape:", SES_test.shape)
✅ SES datasets created: Train shape: (144, 1) Test shape: (43, 1)
# Initialize the Simple Exponential Smoothing (SES) model
# Using the Sparkling sales from the training dataset as the time series input
model_SES = SimpleExpSmoothing(
SES_train['Sparkling'].astype('float') # Ensure numeric type for the model
)
print("✅ SES model initialized successfully")
✅ SES model initialized successfully
# Fit the SES model to the training data
# optimized=True automatically selects the best smoothing parameter (alpha)
model_SES_autofit = model_SES.fit(optimized=True)
# Print model summary to review smoothing level and fit statistics
print("✅ SES model successfully fitted with optimized smoothing parameter:")
print(model_SES_autofit.summary())
✅ SES model successfully fitted with optimized smoothing parameter:
SimpleExpSmoothing Model Results
==============================================================================
Dep. Variable: Sparkling No. Observations: 144
Model: SimpleExpSmoothing SSE 250114511.829
Optimized: True AIC 2072.937
Trend: None BIC 2078.876
Seasonal: None AICC 2073.224
Seasonal Periods: None Date: Sun, 02 Nov 2025
Box-Cox: False Time: 13:44:21
Box-Cox Coeff.: None
==============================================================================
coeff code optimized
------------------------------------------------------------------------------
smoothing_level 0.0366720 alpha True
initial_level 1686.0000 l.0 False
------------------------------------------------------------------------------
# Display the optimized smoothing parameters learned by the SES model
# This shows the best smoothing level (alpha) chosen during fitting
print("📌 Fitted SES Model Parameters:")
model_SES_autofit.params
📌 Fitted SES Model Parameters:
{'smoothing_level': np.float64(0.03667203108765097),
'smoothing_trend': np.float64(nan),
'smoothing_seasonal': np.float64(nan),
'damping_trend': nan,
'initial_level': np.float64(1686.0),
'initial_trend': np.float64(nan),
'initial_seasons': array([], dtype=float64),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}
# Generate forecast for the same number of periods as in the test set
# The SES model predicts future values based on the optimized smoothing level
SES_test['SES_forecast'] = model_SES_autofit.forecast(steps=len(df_test))
# Display first few forecast values to verify the output
print("📌 First few SES forecast values on Test Data:")
SES_test.head()
📌 First few SES forecast values on Test Data:
| Sparkling | SES_forecast | |
|---|---|---|
| Time_Stamp | ||
| 1992-01-31 | 1577 | 2635.950609 |
| 1992-02-29 | 1667 | 2635.950609 |
| 1992-03-31 | 1993 | 2635.950609 |
| 1992-04-30 | 1997 | 2635.950609 |
| 1992-05-31 | 1783 | 2635.950609 |
# Plot the SES Forecast vs Actual values for both Train and Test sets
plt.figure(figsize=(14, 7)) # Larger figure for clearer visualization
# Plot actual training data
plt.plot(SES_train['Sparkling'], label='Training Data', linewidth=2)
# Plot actual test data
plt.plot(SES_test['Sparkling'], label='Actual Test Data', linewidth=2)
# Plot SES predictions
plt.plot(SES_test['SES_forecast'],
label='SES Forecast (Optimized Alpha)',
linestyle='--', linewidth=2)
# Title and axis labeling
plt.title('Simple Exponential Smoothing Forecast vs Actual', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
Model Evaluation for 𝛼 = 0.05 : Simple Exponential Smoothing¶
# Calculate RMSE for SES Model on Test Data
rmse_model_test = root_mean_squared_error(
SES_test['Sparkling'], # Actual values
SES_test['SES_forecast'] # SES model predictions
)
# Print formatted result
print(f"✅ SES Model RMSE on Test Data: {rmse_model_test:.3f}")
✅ SES Model RMSE on Test Data: 1293.814
# Create a new row in the results table for the SES model performance
resultsDf_1 = pd.DataFrame(
{'Test RMSE': [rmse_model_test]},
index=['SES_Model'] # ✅ Model name label
)
# Append the SES model performance to the main results table
resultsDf = pd.concat([resultsDf, resultsDf_1])
# Display updated performance comparison
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
| SimpleAverageModel | 1.268683e+03 |
| SES_Model | 1.293814e+03 |
Setting different alpha values.¶
# Creating an empty DataFrame to store SES performance results
# for different alpha (smoothing level) values
resultsDf_a = pd.DataFrame(columns=['Alpha', 'Train RMSE', 'Test RMSE'])
# Display the initialized DataFrame
print("📊 Initialized Results DataFrame for Alpha Tuning:")
display(resultsDf_a)
📊 Initialized Results DataFrame for Alpha Tuning:
| Alpha | Train RMSE | Test RMSE |
|---|
alphas = np.arange(0.3, 1.0, 0.1)
for alpha in alphas:
# Fit model with specified smoothing level (alpha)
model_SES_alpha = SimpleExpSmoothing(
SES_train['Sparkling']
).fit(smoothing_level=alpha, optimized=False)
# Predictions for train and test
train_pred = model_SES_alpha.fittedvalues
test_pred = model_SES_alpha.forecast(len(SES_test))
# Compute RMSE (new sklearn function)
rmse_train = root_mean_squared_error(SES_train['Sparkling'], train_pred)
rmse_test = root_mean_squared_error(SES_test['Sparkling'], test_pred)
# Append results to table
resultsDf_a.loc[len(resultsDf_a)] = [alpha, rmse_train, rmse_test]
# Display results
print("📊 SES Model Results for Multiple Alpha Values:")
display(resultsDf_a)
📊 SES Model Results for Multiple Alpha Values:
| Alpha | Train RMSE | Test RMSE | |
|---|---|---|---|
| 0 | 0.3 | 1362.731346 | 1900.058569 |
| 1 | 0.4 | 1356.208919 | 2260.069389 |
| 2 | 0.5 | 1347.944758 | 2606.296390 |
| 3 | 0.6 | 1343.099607 | 2924.118301 |
| 4 | 0.7 | 1343.640190 | 3214.744366 |
| 5 | 0.8 | 1349.991473 | 3483.731806 |
| 6 | 0.9 | 1362.270034 | 3736.981096 |
# Sort the results to find the best alpha (lowest Test RMSE)
resultsDf_a = resultsDf_a.sort_values(by='Test RMSE', ascending=True)
# Display the sorted performance table
print("📊 SES Model Performance Sorted by Test RMSE:")
display(resultsDf_a)
📊 SES Model Performance Sorted by Test RMSE:
| Alpha | Train RMSE | Test RMSE | |
|---|---|---|---|
| 0 | 0.3 | 1362.731346 | 1900.058569 |
| 1 | 0.4 | 1356.208919 | 2260.069389 |
| 2 | 0.5 | 1347.944758 | 2606.296390 |
| 3 | 0.6 | 1343.099607 | 2924.118301 |
| 4 | 0.7 | 1343.640190 | 3214.744366 |
| 5 | 0.8 | 1349.991473 | 3483.731806 |
| 6 | 0.9 | 1362.270034 | 3736.981096 |
Get Best Alpha Value¶
best_alpha = resultsDf_a.iloc[0]['Alpha']
print(f"🏆 Best Alpha: {best_alpha:.2f}")
🏆 Best Alpha: 0.30
Refit SES with Best Alpha & Forecast¶
# Fit the best SES model using alpha = 0.30
best_SES_model = SimpleExpSmoothing(
SES_train['Sparkling']
).fit(smoothing_level=0.30, optimized=False)
Forecast for Test Period¶
SES_train['best_SES_pred'] = best_SES_model.fittedvalues
SES_test['best_SES_pred'] = best_SES_model.forecast(len(SES_test))
SES_test.head()
| Sparkling | SES_forecast | best_SES_pred | |
|---|---|---|---|
| Time_Stamp | |||
| 1992-01-31 | 1577 | 2635.950609 | 3795.337234 |
| 1992-02-29 | 1667 | 2635.950609 | 3795.337234 |
| 1992-03-31 | 1993 | 2635.950609 | 3795.337234 |
| 1992-04-30 | 1997 | 2635.950609 | 3795.337234 |
| 1992-05-31 | 1783 | 2635.950609 | 3795.337234 |
plt.figure(figsize=(18,9))
plt.plot(SES_train['Sparkling'], label='Training Data', linewidth=2)
plt.plot(SES_test['Sparkling'], label='Actual Test Data', linewidth=2)
plt.plot(SES_test['best_SES_pred'],
label='SES Forecast (Alpha = 0.30)',
linestyle='--', linewidth=2)
plt.title('SES Forecast vs Actual - Sparkling Sales', fontsize=16)
plt.xlabel('Time')
plt.ylabel('Sales')
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
# Create a results row for the best alpha SES model (Alpha = 0.30)
# We use the minimum Test RMSE value from the sorted alpha tuning results table
resultsDf_2 = pd.DataFrame(
{'Test RMSE': [resultsDf_a.sort_values(by='Test RMSE', ascending=True).values[0][2]]},
index=['SES (Alpha = 0.30)'] # ✅ Completed model name
)
# Append to the main comparison results DataFrame
resultsDf = pd.concat([resultsDf, resultsDf_2])
# Display the updated model comparison table
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
| SimpleAverageModel | 1.268683e+03 |
| SES_Model | 1.293814e+03 |
| SES (Alpha = 0.30) | 1.900059e+03 |
Double Exponential Smoothing (Holt's Model)¶
# Create copies of the train and test datasets for Double Exponential Smoothing (DES)
DES_train = df_train.copy() # Copy of training dataset
DES_test = df_test.copy() # Copy of testing dataset
print("✅ DES datasets created:")
print("Train shape:", DES_train.shape)
print("Test shape:", DES_test.shape)
✅ DES datasets created: Train shape: (144, 1) Test shape: (43, 1)
# Initialize the Holt's Linear Trend (Double Exponential Smoothing) model
# This model captures both LEVEL and TREND in the time series
model_DES = Holt(DES_train['Sparkling'].astype(float))
print("✅ Holt Model initialized successfully for DES.")
✅ Holt Model initialized successfully for DES.
# Fit the Holt (Double Exponential Smoothing) model
# optimized=True by default → automatically selects the best smoothing parameters (alpha & beta)
model_DES_autofit = model_DES.fit(optimized=True)
# Display the smoothing parameters learned by the model
print("✅ DES Model successfully fitted!")
print("📌 Model Parameters:")
print(model_DES_autofit.params)
✅ DES Model successfully fitted!
📌 Model Parameters:
{'smoothing_level': np.float64(0.6678921248101082), 'smoothing_trend': np.float64(0.00567583364200997), 'smoothing_seasonal': np.float64(nan), 'damping_trend': nan, 'initial_level': np.float64(1686.0), 'initial_trend': np.float64(-95.0), 'initial_seasons': array([], dtype=float64), 'use_boxcox': False, 'lamda': None, 'remove_bias': False}
# Display the optimized smoothing parameters learned by the Holt model
print("📌 Holt DES Model Parameters (α = level, β = trend):")
model_DES_autofit.params
📌 Holt DES Model Parameters (α = level, β = trend):
{'smoothing_level': np.float64(0.6678921248101082),
'smoothing_trend': np.float64(0.00567583364200997),
'smoothing_seasonal': np.float64(nan),
'damping_trend': nan,
'initial_level': np.float64(1686.0),
'initial_trend': np.float64(-95.0),
'initial_seasons': array([], dtype=float64),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}
# Check the shape of the DES_test dataset
# This tells us the number of rows (time periods) and columns (features) available for forecasting evaluation
print("📌 Shape of DES Test Dataset (rows, columns):")
DES_test.shape
📌 Shape of DES Test Dataset (rows, columns):
(43, 1)
# Predict future values for the same length as the test dataset
# Using Holt's model with optimized smoothing parameters
DES_test['DES_forecast'] = model_DES_autofit.forecast(steps=len(DES_test))
# Display preview of forecast values
print("📌 First few DES forecast values:")
DES_test.head()
📌 First few DES forecast values:
| Sparkling | DES_forecast | |
|---|---|---|
| Time_Stamp | ||
| 1992-01-31 | 1577 | 5193.610144 |
| 1992-02-29 | 1667 | 5169.598791 |
| 1992-03-31 | 1993 | 5145.587438 |
| 1992-04-30 | 1997 | 5121.576085 |
| 1992-05-31 | 1783 | 5097.564731 |
# Plot DES (Double Exponential Smoothing) Forecast vs Actual Data
plt.figure(figsize=(18,9))
# Training history
plt.plot(DES_train['Sparkling'], label='Training Data', linewidth=2)
# Actual future observations
plt.plot(DES_test['Sparkling'], label='Actual Test Data', linewidth=2)
# DES predictions using optimized alpha & beta
alpha = model_DES_autofit.params['smoothing_level']
beta = model_DES_autofit.params['smoothing_trend']
plt.plot(
DES_test['DES_forecast'],
label=f'DES Forecast (α={alpha:.3f}, β={beta:.3f})',
linestyle='--', linewidth=2
)
plt.title('Double Exponential Smoothing (Holt) – Forecast vs Actual', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
# Get optimized parameters from the fitted Holt (DES) model
alpha = model_DES_autofit.params['smoothing_level']
beta = model_DES_autofit.params['smoothing_trend']
# Compute RMSE on the test set using the DES forecast column
rmse_model_test = root_mean_squared_error(
DES_test['Sparkling'], # actuals
DES_test['DES_forecast'] # predictions from DES
)
# Nicely formatted result with true alpha/beta values
print(f"✅ Double Exponential Smoothing on Test Data — RMSE: {rmse_model_test:.3f} "
f"(α = {alpha:.5f}, β = {beta:.5f})")
✅ Double Exponential Smoothing on Test Data — RMSE: 2654.534 (α = 0.66789, β = 0.00568)
# Create DataFrame entry for DES model performance
resultsDf_DES = pd.DataFrame(
{'Test RMSE': [rmse_model_test]},
index=['DES_Model'] # ✅ Completed model name
)
# Append DES results to the main comparison DataFrame
resultsDf = pd.concat([resultsDf, resultsDf_DES])
# Display the updated model leaderboard
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
| SimpleAverageModel | 1.268683e+03 |
| SES_Model | 1.293814e+03 |
| SES (Alpha = 0.30) | 1.900059e+03 |
| DES_Model | 2.654534e+03 |
# Create an empty DataFrame to store Holt (DES) tuning results
# for different Alpha (level smoothing) and Beta (trend smoothing)
resultsDf_b = pd.DataFrame(columns=['Alpha', 'Beta', 'Train RMSE', 'Test RMSE'])
# Display initialized tuning table
print("📊 Initialized Results table for DES Alpha/Beta tuning:")
display(resultsDf_b)
📊 Initialized Results table for DES Alpha/Beta tuning:
| Alpha | Beta | Train RMSE | Test RMSE |
|---|
# Number of observations in the test dataset
print("📌 Number of periods in Test Set:", len(df_test))
📌 Number of periods in Test Set: 43
# Grids for alpha (level) and beta (trend)
alphas = np.arange(0.3, 1.1, 0.1)
betas = np.arange(0.3, 1.1, 0.1)
# Ensure resultsDf_b has the right columns
resultsDf_b = pd.DataFrame(columns=['Alpha', 'Beta', 'Train RMSE', 'Test RMSE'])
# Length of test horizon
h = len(DES_test)
for alpha in alphas:
for beta in betas:
# Fit Holt (DES) with fixed smoothing params
model_ij = Holt(DES_train['Sparkling']).fit(
smoothing_level=alpha,
smoothing_trend=beta,
optimized=False
)
# Predictions (keep them as local arrays/Series)
train_pred = model_ij.fittedvalues
test_pred = model_ij.forecast(h)
# RMSE on train and test
rmse_train = root_mean_squared_error(DES_train['Sparkling'], train_pred)
rmse_test = root_mean_squared_error(DES_test['Sparkling'], test_pred)
# Append a row to the results table
resultsDf_b.loc[len(resultsDf_b)] = [alpha, beta, rmse_train, rmse_test]
# Sort to see the best combination on top
resultsDf_b = resultsDf_b.sort_values(by='Test RMSE', ascending=True)
# Peek at the best few
print("📊 Top DES alpha/beta combos by Test RMSE:")
display(resultsDf_b.head())
📊 Top DES alpha/beta combos by Test RMSE:
| Alpha | Beta | Train RMSE | Test RMSE | |
|---|---|---|---|---|
| 0 | 0.3 | 0.3 | 1599.398181 | 13814.398308 |
| 8 | 0.4 | 0.3 | 1578.236948 | 18044.669982 |
| 1 | 0.3 | 0.4 | 1693.819564 | 19249.257218 |
| 16 | 0.5 | 0.3 | 1538.865716 | 20684.637333 |
| 24 | 0.6 | 0.3 | 1513.676637 | 22536.689632 |
# Display all column names in the DES_test DataFrame
print(list(DES_test.columns))
['Sparkling', 'DES_forecast']
# Extract the Best Performing Row Automatically
best_row = resultsDf_b.sort_values(by='Test RMSE', ascending=True).iloc[0]
best_alpha = best_row['Alpha']
best_beta = best_row['Beta']
best_rmse = best_row['Test RMSE']
best_alpha, best_beta, best_rmse
(np.float64(0.3), np.float64(0.3), np.float64(13814.398308245514))
#Create a Clean, Accurate Performance Entry
# Assign a readable and dynamic model name based on best α and β
model_name = f"DES (Alpha={best_alpha:.2f}, Beta={best_beta:.2f})"
# Create dataframe for this result
resultsDf_3 = pd.DataFrame({'Test RMSE': [best_rmse]}, index=[model_name])
# Append to main results table
resultsDf = pd.concat([resultsDf, resultsDf_3])
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
| SimpleAverageModel | 1.268683e+03 |
| SES_Model | 1.293814e+03 |
| SES (Alpha = 0.30) | 1.900059e+03 |
| DES_Model | 2.654534e+03 |
| DES (Alpha=0.30, Beta=0.30) | 1.381440e+04 |
Triple Exponential Smoothing (Holt - Winter's Model)¶
# Creating copies of the train and test datasets for Holt-Winter's Model
TES_train = df_train.copy()
TES_test = df_test.copy()
print("✅ TES datasets created successfully")
print("Train shape:", TES_train.shape)
print("Test shape:", TES_test.shape)
✅ TES datasets created successfully Train shape: (144, 1) Test shape: (43, 1)
model_TES = ExponentialSmoothing(TES_train['Sparkling'],trend='additive',seasonal='multiplicative',freq='M')
# Initialize the Holt-Winters Triple Exponential Smoothing (TES) Model
model_TES = ExponentialSmoothing(
TES_train['Sparkling'],
trend='add', # Additive trend component (linear movement over time)
seasonal='mul', # Multiplicative seasonal effect (seasonal variation grows over time)
seasonal_periods=12 # 12 months → yearly seasonality
)
print("✅ Holt-Winters TES model initialized successfully!")
# NOTE: freq='M' is no longer supported as a parameter
✅ Holt-Winters TES model initialized successfully!
# Fit the Holt-Winters TES model
# optimized=True by default → automatically tunes alpha, beta, gamma
model_TES_autofit = model_TES.fit(optimized=True) # optimized=True to find the best possible values for alpha, beta, and gamma to improve forecast accuracy.
# Show the final parameters learned by the model
print("✅ TES model successfully fitted!")
print("📌 Optimized parameters:")
print(model_TES_autofit.params)
✅ TES model successfully fitted!
📌 Optimized parameters:
{'smoothing_level': np.float64(0.07491295595488828), 'smoothing_trend': np.float64(0.07491295595488828), 'smoothing_seasonal': np.float64(0.2833959688663245), 'damping_trend': nan, 'initial_level': np.float64(2356.5020981873167), 'initial_trend': np.float64(-19.55614396441527), 'initial_seasons': array([0.69642481, 0.66263147, 0.8661978 , 0.78371893, 0.6409794 ,
0.62710199, 0.86203455, 1.11207461, 0.90589561, 1.22639337,
1.87827821, 2.43148971]), 'use_boxcox': False, 'lamda': None, 'remove_bias': False}
# Display the optimized smoothing parameters learned by the Holt-Winters TES model
print("📌 Optimized parameters from TES Model:")
model_TES_autofit.params
📌 Optimized parameters from TES Model:
{'smoothing_level': np.float64(0.07491295595488828),
'smoothing_trend': np.float64(0.07491295595488828),
'smoothing_seasonal': np.float64(0.2833959688663245),
'damping_trend': nan,
'initial_level': np.float64(2356.5020981873167),
'initial_trend': np.float64(-19.55614396441527),
'initial_seasons': array([0.69642481, 0.66263147, 0.8661978 , 0.78371893, 0.6409794 ,
0.62710199, 0.86203455, 1.11207461, 0.90589561, 1.22639337,
1.87827821, 2.43148971]),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}
# Predict future values for the same length as the test dataset
# Using the optimized Holt-Winters TES model
TES_test['TES_forecast'] = model_TES_autofit.forecast(steps=len(df_test))
# Preview the forecast results
print("📌 First few TES forecast values on Test Data:")
TES_test.head()
📌 First few TES forecast values on Test Data:
| Sparkling | TES_forecast | |
|---|---|---|
| Time_Stamp | ||
| 1992-01-31 | 1577 | 1699.806254 |
| 1992-02-29 | 1667 | 1588.100259 |
| 1992-03-31 | 1993 | 1797.378411 |
| 1992-04-30 | 1997 | 1568.061775 |
| 1992-05-31 | 1783 | 1518.346802 |
# Plot TES Forecast vs Actual
plt.figure(figsize=(18,9))
# Plot Training Data
plt.plot(TES_train['Sparkling'], label='Training Data', linewidth=2)
# Plot Actual Test Data
plt.plot(TES_test['Sparkling'], label='Actual Test Data', linewidth=2)
# Extract optimized parameters
alpha = model_TES_autofit.params['smoothing_level']
beta = model_TES_autofit.params['smoothing_trend']
gamma = model_TES_autofit.params['smoothing_seasonal']
# Plot the TES Forecast
plt.plot(
TES_test['TES_forecast'],
label=f'TES Forecast (α={alpha:.3f}, β={beta:.3f}, γ={gamma:.3f})',
linestyle='--',
linewidth=2
)
# Formatting & Labels
plt.title("Holt-Winters Triple Exponential Smoothing - Forecast vs Actual", fontsize=16)
plt.xlabel("Time", fontsize=12)
plt.ylabel("Sparkling Sales", fontsize=12)
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
# Extract optimized smoothing parameters from the model
alpha = model_TES_autofit.params['smoothing_level']
beta = model_TES_autofit.params['smoothing_trend']
gamma = model_TES_autofit.params['smoothing_seasonal']
# Calculate RMSE on the test set using TES forecast
rmse_model_test = root_mean_squared_error(
TES_test['Sparkling'], # actual values
TES_test['TES_forecast'] # forecast from TES model
)
# Display results properly formatted with real values
print(f"✅ TES Model RMSE on Test Data: {rmse_model_test:.3f} "
f"(α={alpha:.3f}, β={beta:.3f}, γ={gamma:.3f})")
✅ TES Model RMSE on Test Data: 350.053 (α=0.075, β=0.075, γ=0.283)
# Create an empty DataFrame to store TES alpha, beta, gamma tuning results
resultsDf_c = pd.DataFrame(
columns=['Alpha', 'Beta', 'Gamma', 'Train RMSE', 'Test RMSE']
)
# Display initialized table
print("📊 Initialized TES parameter tuning table:")
display(resultsDf_c)
📊 Initialized TES parameter tuning table:
| Alpha | Beta | Gamma | Train RMSE | Test RMSE |
|---|
# Ensure results table has clean column names
resultsDf_c = pd.DataFrame(columns=['Alpha', 'Beta', 'Gamma', 'Train RMSE', 'Test RMSE'])
alphas = np.arange(0.3, 1.1, 0.1)
betas = np.arange(0.3, 1.1, 0.1)
gammas = np.arange(0.3, 1.1, 0.1)
h = len(TES_test)
row = 0
for alpha in alphas:
for beta in betas:
for gamma in gammas:
# Fit TES with fixed smoothing params
model = ExponentialSmoothing(
TES_train['Sparkling'],
trend='add',
seasonal='mul',
seasonal_periods=12
).fit(
smoothing_level=alpha,
smoothing_trend=beta,
smoothing_seasonal=gamma,
optimized=False
)
# In-sample fitted values and out-of-sample forecast
train_pred = model.fittedvalues
test_pred = model.forecast(h)
# RMSEs
rmse_train = root_mean_squared_error(TES_train['Sparkling'], train_pred)
rmse_test = root_mean_squared_error(TES_test['Sparkling'], test_pred)
# Append a row
resultsDf_c.loc[row] = [alpha, beta, gamma, rmse_train, rmse_test]
row += 1
# Sort so the best (lowest Test RMSE) is first
resultsDf_c = resultsDf_c.sort_values(by='Test RMSE', ascending=True)
print("📊 Top TES α/β/γ by Test RMSE:")
display(resultsDf_c.head())
📊 Top TES α/β/γ by Test RMSE:
| Alpha | Beta | Gamma | Train RMSE | Test RMSE | |
|---|---|---|---|---|---|
| 128 | 0.5 | 0.3 | 0.3 | 448.489627 | 412.613756 |
| 194 | 0.6 | 0.3 | 0.5 | 541.879579 | 478.752553 |
| 72 | 0.4 | 0.4 | 0.3 | 451.720449 | 485.608082 |
| 260 | 0.7 | 0.3 | 0.7 | 726.763349 | 561.382299 |
| 81 | 0.4 | 0.5 | 0.4 | 508.681964 | 579.245326 |
best_row = resultsDf_c.sort_values('Test RMSE', ascending=True).iloc[0]
a, b, g = float(best_row['Alpha']), float(best_row['Beta']), float(best_row['Gamma'])
# 2) Refit Holt-Winters with those parameters (no MultiIndex columns)
best_TES_model = ExponentialSmoothing(
TES_train['Sparkling'],
trend='add',
seasonal='mul',
seasonal_periods=12
).fit(
smoothing_level=a,
smoothing_trend=b,
smoothing_seasonal=g,
optimized=False
)
# 3) Forecast cleanly into a single column
TES_test['TES_bf_forecast'] = best_TES_model.forecast(len(TES_test))
# 4) Plot train, test, and the brute-force (best α/β/γ) forecast
plt.figure(figsize=(18,9))
plt.plot(TES_train['Sparkling'], label='Training Data', linewidth=2)
plt.plot(TES_test['Sparkling'], label='Actual Test Data', linewidth=2)
plt.plot(TES_test['TES_bf_forecast'],
label=f'TES Forecast (α={a:.2f}, β={b:.2f}, γ={g:.2f})',
linestyle='--', linewidth=2)
plt.title('Holt-Winters (TES) — Forecast vs Actual (Grid-Searched α,β,γ)', fontsize=16)
plt.xlabel('Time'); plt.ylabel('Sparkling Sales')
plt.legend(loc='best'); plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout(); plt.show()
# Extract the best performing row automatically
best_row = resultsDf_c.sort_values(by='Test RMSE').iloc[0]
best_alpha = best_row['Alpha']
best_beta = best_row['Beta']
best_gamma = best_row['Gamma']
best_rmse = best_row['Test RMSE']
# Create a readable model label based on actual best values
model_name = f"TES (α={best_alpha:.2f}, β={best_beta:.2f}, γ={best_gamma:.2f})"
# Add this result to the performance leaderboard
resultsDf_2 = pd.DataFrame({'Test RMSE': [best_rmse]}, index=[model_name])
resultsDf = pd.concat([resultsDf, resultsDf_2])
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
| Test RMSE | |
|---|---|
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
| SimpleAverageModel | 1.268683e+03 |
| SES_Model | 1.293814e+03 |
| SES (Alpha = 0.30) | 1.900059e+03 |
| DES_Model | 2.654534e+03 |
| DES (Alpha=0.30, Beta=0.30) | 1.381440e+04 |
| TES (α=0.50, β=0.30, γ=0.30) | 4.126138e+02 |
# Sort the model performance results in ascending order of RMSE
# ✅ Lower RMSE means better model accuracy
resultsDf1 = resultsDf.sort_values(by='Test RMSE', ascending=True)
print("📊 Model Performance Ranking (Lowest RMSE = Best):")
display(resultsDf1)
📊 Model Performance Ranking (Lowest RMSE = Best):
| Test RMSE | |
|---|---|
| TES (α=0.50, β=0.30, γ=0.30) | 4.126138e+02 |
| SimpleAverageModel | 1.268683e+03 |
| SES_Model | 1.293814e+03 |
| SES (Alpha = 0.30) | 1.900059e+03 |
| DES_Model | 2.654534e+03 |
| DES (Alpha=0.30, Beta=0.30) | 1.381440e+04 |
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
Modified Format RMSE Values as Integers¶
# Sort performance results by Test RMSE (lowest = best)
resultsDf1 = resultsDf.sort_values(by='Test RMSE', ascending=True)
print("📊 Model Performance Ranking (Lowest RMSE = Best):")
display(resultsDf1.style.format({'Test RMSE': '{:,.2f}'}))
📊 Model Performance Ranking (Lowest RMSE = Best):
| Test RMSE | |
|---|---|
| TES (α=0.50, β=0.30, γ=0.30) | 412.61 |
| SimpleAverageModel | 1,268.68 |
| SES_Model | 1,293.81 |
| SES (Alpha = 0.30) | 1,900.06 |
| DES_Model | 2,654.53 |
| DES (Alpha=0.30, Beta=0.30) | 13,814.40 |
| Regression On Time Model | 1,840,430.14 |
| NaiveForecast | 15,839,720.95 |
Observations¶
Naive Forecast Model
The Naive model predicts every future point as the last observed sales value.
Completely fails to capture seasonal spikes, resulting in a very high error (worst RMSE).
Simple Average Model
Forecast is flat, based on the long term average.
Slightly superior to Naive and yet does not capture seasonal variations, resulting in large errors during months of high peaks.
Simple Exponential Smoothing (SES)
Gives more weight to recent data but still ignores trend and seasonality.
Slightly outperforms the Simple Average model but is still inadequate when the seasonal patterns are strong.
Des (Double Exponential Smoothing)
It catches a linear trend but not seasonality.
Forecasts produce a declining trend that does not match real seasonal peaks.
Better than SES but still poor performance during high demand periods.
Triple Exponential Smoothing (TES / Holt Winters)
- Successfully models trend + seasonality, closely following real values.
It smoothly matches the seasonal peaks and channels during the test period.
- Best performance with lowest RMSE: most accurate model of all that were tested.
Conclusion:
- Holt Winters TES is the best forecasting model of Sparkling Wine Sales, as it can capture a long-term trend and strong seasonal pattern with the lowest error rate.
Checking for Stationarity¶
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
#determining roll statistics
rolmean = timeseries.rolling(window=12).mean()
rolstd = timeseries.rolling(window=12).std()
##plot rolling Statistics:
orig = plt.plot(timeseries,color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label='Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean and Standard Deviation')
plt.show(block=False)
#Perform Dickey-Fuller Test:
print('Results of Dickey Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print(dfoutput,'\n')
def test_stationarity(ts, window=12, title="Rolling Mean & Std"):
"""
Check stationarity of a time series using rolling stats + ADF test.
Parameters
----------
ts : pd.Series
Univariate time series (indexed by datetime ideally).
window : int, default=12
Rolling window size (12 = monthly data, 1 year).
title : str
Title for the rolling statistics plot.
Returns
-------
results : dict
Dictionary with ADF test statistic, p-value, lags, nobs, and critical values.
"""
# Ensure Series and drop missing values (ADF cannot handle NaN)
ts = pd.Series(ts).dropna()
# --- Rolling statistics
rolmean = ts.rolling(window=window, min_periods=window).mean()
rolstd = ts.rolling(window=window, min_periods=window).std()
# --- Plot
plt.figure(figsize=(12, 5))
plt.plot(ts, label='Original', linewidth=2)
plt.plot(rolmean, label=f'Rolling Mean (win={window})', linestyle='--')
plt.plot(rolstd, label=f'Rolling Std (win={window})', linestyle=':')
plt.legend(loc='best')
plt.title(title)
plt.grid(True, linestyle='--', alpha=0.4)
plt.tight_layout()
plt.show()
# --- ADF Test
print("Results of Augmented Dickey-Fuller (ADF) Test")
dftest = adfuller(ts.values, autolag='AIC') # use values to avoid index issues
adf_stat, pval, used_lag, nobs = dftest[0], dftest[1], dftest[2], dftest[3]
crit_vals = dftest[4]
# Pretty print
print(f"Test Statistic : {adf_stat: .4f}")
print(f"p-value : {pval: .6f}")
print(f"# Lags Used : {used_lag}")
print(f"# Observations : {nobs}")
for k, v in crit_vals.items():
print(f"Critical Value ({k}) : {v: .4f}")
# Simple verdict
verdict = "Stationary (reject H0)" if pval < 0.05 else "Non-stationary (fail to reject H0)"
print(f"\nVerdict @ 5% : {verdict}\n")
return {
"adf_stat": adf_stat,
"p_value": pval,
"used_lag": used_lag,
"nobs": nobs,
"critical_values": crit_vals,
"verdict_5pct": verdict
}
# ✅ Checking stationarity for the Training portion of Sparkling Sales
# This helps determine our differencing needs for SARIMA modeling
test_stationarity(df_train['Sparkling'])
Results of Augmented Dickey-Fuller (ADF) Test Test Statistic : -1.2658 p-value : 0.644683 # Lags Used : 12 # Observations : 131 Critical Value (1%) : -3.4813 Critical Value (5%) : -2.8839 Critical Value (10%) : -2.5787 Verdict @ 5% : Non-stationary (fail to reject H0)
{'adf_stat': np.float64(-1.2657713392815138),
'p_value': np.float64(0.6446829195512411),
'used_lag': 12,
'nobs': 131,
'critical_values': {'1%': np.float64(-3.481281802271349),
'5%': np.float64(-2.883867891664528),
'10%': np.float64(-2.5786771965503177)},
'verdict_5pct': 'Non-stationary (fail to reject H0)'}
Observations:
ADF Statistic: -1.2658
p-value: 0.6447
Critical Value @ 5%: -2.8839
Conclusion:
Fail to reject the null hypothesis (H0)
The p-value is significantly higher than 0.05 and the ADF statistic is less negative than the critical value, indicating that the Sparkling Wine Sales series is non-stationary. This confirms the presence of trend and seasonal effects observed in the earlier plots.
df_train Model
dftest = adfuller(df_train.diff().dropna(),regression='ct')
print('DF test statistics is %3.3f' %dftest[0])
print('DF test p-value is', dftest[1])
print('DF test p-value is', dftest[2])
DF test statistics is -8.628 DF test p-value is 2.5490597847949586e-12 DF test p-value is 11
# ADF on first-differenced training series (removes linear trend)
# regression='ct' includes constant + linear trend in the test regression
ts_diff1 = df_train['Sparkling'].diff().dropna()
dftest = adfuller(ts_diff1.values, regression='ct', autolag='AIC')
# Unpack results
adf_stat, pval, used_lag, nobs = dftest[0], dftest[1], dftest[2], dftest[3]
crit_vals = dftest[4]
# Pretty print
print("Results of Augmented Dickey-Fuller (ADF) on df_train ΔSparkling (regression='ct')")
print(f"Test Statistic : {adf_stat: .4f}")
print(f"p-value : {pval: .6f}")
print(f"# Lags Used : {used_lag}")
print(f"# Observations : {nobs}")
for k, v in crit_vals.items():
print(f"Critical Value ({k}) : {v: .4f}")
# Verdict at 5%
verdict = "Stationary (reject H0)" if pval < 0.05 else "Non-stationary (fail to reject H0)"
print(f"\nVerdict @ 5% : {verdict}")
Results of Augmented Dickey-Fuller (ADF) on df_train ΔSparkling (regression='ct') Test Statistic : -8.6282 p-value : 0.000000 # Lags Used : 11 # Observations : 131 Critical Value (1%) : -4.0296 Critical Value (5%) : -3.4446 Critical Value (10%) : -3.1470 Verdict @ 5% : Stationary (reject H0)
plt.figure(figsize=(14, 6)) # Bigger chart for better visibility
# Plot the training data for Sparkling Sales
df_train['Sparkling'].plot(linewidth=2, label='Training Sparkling Sales')
# Improve readability
plt.title('Training Data – Sparkling Wine Sales', fontsize=14)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sales', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(loc='best')
plt.tight_layout()
plt.show()
Observations:
ADF Statistic: -8.6282
p-value: 0.0000
Critical Value @ 5%: -3.4446
conclusion:
Reject the null hypothesis (H₀)
After applying first differencing, the series becomes stationary. The low p-value (0.0000) and ADF statistic being much lower than the critical value indicate that the transformed data now has stable mean and variance, making it suitable for ARIMA/SARIMA modeling.
Automated ARIMA Model based on lowest AIC¶
import itertools
# Define the parameter ranges for ARIMA (p, d, q)
# p = AR order (number of lag observations included)
# d = differencing order (to make the series stationary)
# q = MA order (size of moving average window)
p = q = range(0, 4) # p and q will each take values: 0,1,2,3
d = range(1, 2) # d will only take value: 1 (based on stationarity test)
# Creating all combinations of (p,d,q) using Cartesian product
# itertools.product generates every possible tuple like (0,1,0), (1,1,2), etc.
pdq = list(itertools.product(p, d, q))
# Display the generated model parameter combinations
print('Examples of the parameter combinations for the ARIMA models')
for i in range(0, len(pdq)):
# Format each tuple into a readable model notation
print('Model : ARIMA{}'.format(pdq[i]))
Examples of the parameter combinations for the ARIMA models Model : ARIMA(0, 1, 0) Model : ARIMA(0, 1, 1) Model : ARIMA(0, 1, 2) Model : ARIMA(0, 1, 3) Model : ARIMA(1, 1, 0) Model : ARIMA(1, 1, 1) Model : ARIMA(1, 1, 2) Model : ARIMA(1, 1, 3) Model : ARIMA(2, 1, 0) Model : ARIMA(2, 1, 1) Model : ARIMA(2, 1, 2) Model : ARIMA(2, 1, 3) Model : ARIMA(3, 1, 0) Model : ARIMA(3, 1, 1) Model : ARIMA(3, 1, 2) Model : ARIMA(3, 1, 3)
# Creating an empty dataframe to store ARIMA model results
# Columns:
# param → the (p, d, q) parameter combination tested for ARIMA
# AIC → the AIC score for each fitted model
ARIMA_AIC = pd.DataFrame(columns=['param', 'AIC'])
# Display the initialized table
print("✅ Initialized ARIMA AIC results table:")
ARIMA_AIC
✅ Initialized ARIMA AIC results table:
| param | AIC |
|---|
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings("ignore") # Suppress convergence warnings for cleaner output
# Loop through each (p, d, q) combination in pdq
for param in pdq:
try:
# Fit ARIMA(p,d,q) model on training series
model = ARIMA(df_train['Sparkling'], order=param).fit()
# Print AIC for each model
print(f"ARIMA{param} → AIC = {model.aic:.2f}")
# Store model parameters and AIC into results dataframe
ARIMA_AIC.loc[len(ARIMA_AIC)] = [param, model.aic]
except Exception as e:
# Some parameter combinations fail to converge → skip safely
print(f"ARIMA{param} → ❌ Failed ({e})")
continue
ARIMA(0, 1, 0) → AIC = 2476.75 ARIMA(0, 1, 1) → AIC = 2470.90 ARIMA(0, 1, 2) → AIC = 2439.47 ARIMA(0, 1, 3) → AIC = 2439.13 ARIMA(1, 1, 0) → AIC = 2474.92 ARIMA(1, 1, 1) → AIC = 2440.49 ARIMA(1, 1, 2) → AIC = 2439.71 ARIMA(1, 1, 3) → AIC = 2440.81 ARIMA(2, 1, 0) → AIC = 2468.71 ARIMA(2, 1, 1) → AIC = 2438.87 ARIMA(2, 1, 2) → AIC = 2419.17 ARIMA(2, 1, 3) → AIC = 2438.37 ARIMA(3, 1, 0) → AIC = 2466.35 ARIMA(3, 1, 1) → AIC = 2440.43 ARIMA(3, 1, 2) → AIC = 2435.48 ARIMA(3, 1, 3) → AIC = 2427.51
# Sort ARIMA results by AIC in ascending order (lowest AIC = best model)
best_ARIMA = ARIMA_AIC.sort_values(by='AIC', ascending=True).reset_index(drop=True)
# Display the top 5 best-performing ARIMA models based on AIC
print("📊 Top ARIMA(p,d,q) models ranked by lowest AIC:")
display(best_ARIMA.head().style.format({'AIC': '{:,.2f}'}))
📊 Top ARIMA(p,d,q) models ranked by lowest AIC:
| param | AIC | |
|---|---|---|
| 0 | (2, 1, 2) | 2,419.17 |
| 1 | (3, 1, 3) | 2,427.51 |
| 2 | (3, 1, 2) | 2,435.48 |
| 3 | (2, 1, 3) | 2,438.37 |
| 4 | (2, 1, 1) | 2,438.87 |
# ✅ Fit the best ARIMA model found from AIC ranking
# Using only the target variable (Sparkling), not the full dataframe
auto_ARIMA = ARIMA(df_train['Sparkling'], order=(2,1,2))
# Fit the model to learn AR, I, MA parameters
results_auto_ARIMA = auto_ARIMA.fit()
# Display full model summary including AIC & coefficients
print("📌 ARIMA(2,1,2) Model Summary:\n")
print(results_auto_ARIMA.summary())
📌 ARIMA(2,1,2) Model Summary:
SARIMAX Results
==============================================================================
Dep. Variable: Sparkling No. Observations: 144
Model: ARIMA(2, 1, 2) Log Likelihood -1204.583
Date: Sun, 02 Nov 2025 AIC 2419.167
Time: 13:44:37 BIC 2433.981
Sample: 01-31-1980 HQIC 2425.187
- 12-31-1991
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 1.3214 0.043 30.410 0.000 1.236 1.407
ar.L2 -0.5501 0.062 -8.906 0.000 -0.671 -0.429
ma.L1 -1.9912 0.105 -18.931 0.000 -2.197 -1.785
ma.L2 0.9995 0.106 9.473 0.000 0.793 1.206
sigma2 1.136e+06 1.88e-07 6.05e+12 0.000 1.14e+06 1.14e+06
===================================================================================
Ljung-Box (L1) (Q): 0.07 Jarque-Bera (JB): 15.12
Prob(Q): 0.80 Prob(JB): 0.00
Heteroskedasticity (H): 2.03 Skew: 0.61
Prob(H) (two-sided): 0.02 Kurtosis: 4.03
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.15e+28. Standard errors may be unstable.
# Plotting ARIMA diagnostic plots to evaluate residual behavior
# This includes:
# ✅ Standardized residuals (look for randomness)
# ✅ Q-Q plot (check normal distribution)
# ✅ ACF of residuals (look for no auto-correlation)
# ✅ Histogram of residuals (normal bell curve)
results_auto_ARIMA.plot_diagnostics(figsize=(14, 10))
plt.suptitle("ARIMA Model Diagnostics", fontsize=14, y=1.02)
plt.show()
Predict on the Test Set using this model and evaluate the model.
# ✅ Forecast future values using the fitted ARIMA model
# steps=len(df_test) → predict the same number of periods as the Test dataset
predicted_auto_ARIMA = results_auto_ARIMA.forecast(steps=len(df_test))
## Mean Absolute Percentage Error (MAPE) - Function Definition
def mean_absolute_percentage_error(y_true, y_pred):
return np.mean((np.abs(y_true-y_pred))/(y_true))*100
## Importing the mean_squared_error function from sklearn to calculate the RMSE
# ✅ MAPE (Mean Absolute Percentage Error) function
# Measures the forecast error as a percentage of the actual values
# multiplied by 100 to express as a % error
# np.clip is used to avoid division by zero issues if any actual value = 0
def mean_absolute_percentage_error(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred) # ensure arrays
return np.mean(np.abs((y_true - y_pred) / np.clip(y_true, 1e-10, None))) * 100
# ^ avoid 0 division by replacing zeros with tiny number (1e-10)
# ✅ mean_squared_error imported to compute RMSE:
# RMSE = sqrt(MSE)
# Lower RMSE → better forecast accuracy
# ✅ Evaluate ARIMA(2,1,2) Forecast Performance
# RMSE → Root Mean Squared Error
# Measures forecast error in the same units as sales.
# Lower RMSE = more accurate forecast.
rmse = root_mean_squared_error(df_test['Sparkling'], predicted_auto_ARIMA)
# MAPE → Mean Absolute Percentage Error
# Measures the average percentage difference between forecast and actual values.
# Lower MAPE (%) = better forecast quality relative to actual scale.
mape = mean_absolute_percentage_error(df_test['Sparkling'], predicted_auto_ARIMA)
# ✅ Display results with formatted text for readability
print(f"📌 ARIMA(2,1,2) — RMSE: {rmse:,.2f} | MAPE: {mape:.2f}%")
📌 ARIMA(2,1,2) — RMSE: 1,309.63 | MAPE: 42.11%
from math import sqrt
from sklearn.metrics import mean_squared_error
# ✅ RMSE: Root Mean Squared Error calculation
# mean_squared_error → returns MSE
# sqrt(MSE) → converts MSE to RMSE (original units of sales)
rmse = sqrt(mean_squared_error(df_test['Sparkling'], predicted_auto_ARIMA))
# Display the RMSE result rounded for clarity
print(f"📌 ARIMA(2,1,2) — RMSE: {rmse:,.2f}")
📌 ARIMA(2,1,2) — RMSE: 1,309.63
# ✅ Create a performance table (DataFrame) to store forecast accuracy metrics
# rmse → Root Mean Squared Error (lower = better accuracy)
# mape → Mean Absolute Percentage Error (lower % = better accuracy)
# index → Model name for easy comparison later
resultsDf = pd.DataFrame(
{'RMSE': rmse, 'MAPE': mape},
index=['ARIMA(2,1,2)']
)
# ✅ Display the results table for ARIMA model
resultsDf
| RMSE | MAPE | |
|---|---|---|
| ARIMA(2,1,2) | 1309.631475 | 42.106197 |
Observations:
Coefficients are mostly statistically significant
Diagnostic plots show residuals are close to normal
No major issues of autocorrelation left in residuals
The model fits reasonably well.
Slight skew & kurtosis indicate model can still be improved
ARIMA(2,1,2) provided lower forecast error and residuals that are better behaved, hence it catches the underlying trend better and thus is a reliable model for this series.
Automated SARIMA Model based on lowest AIC¶
# ✅ Autocorrelation plot on first-differenced training data
# First difference removes trend → helps us inspect remaining correlations
# missing='drop' handles the NaN created by differencing
# lags=40 shows the first 40 lag correlations
plot_acf(df_train['Sparkling'].diff().dropna(),
title='Training Data Autocorrelation (After First Difference)',
lags=40);
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
import itertools
# ✅ Define parameter ranges for ARIMA(p,d,q)
# p = AR order → number of autoregressive lags
# d = differencing order → to remove trend (we already found d=1)
# q = MA order → number of lagged forecast errors in the model
p = q = range(0, 4) # p and q will take values 0,1,2,3
d = range(1, 2) # d = 1 based on ADF test
# ✅ Define seasonal parameter ranges for SARIMA(P,D,Q,s)
# P = seasonal AR order
# D = seasonal differencing (start with 0 since we allow the model to choose)
# Q = seasonal MA order
# s = 12 months (seasonality period = yearly cycles)
D = range(0, 1) # allowing D = 0 initially (we can later test D=1)
pdq = list(itertools.product(p, d, q)) # non-seasonal combos
PDQ = [(x[0], x[1], x[2], 12) # seasonal combos with period=12
for x in list(itertools.product(p, D, q))]
# ✅ Print sample model combinations
print('Examples of the parameter combinations for the SARIMA Model:')
for i in range(min(10, len(pdq))): # show first 10 only for brevity
print(f"SARIMA order: {pdq[i]} x Seasonal order: {PDQ[i]}")
Examples of the parameter combinations for the SARIMA Model: SARIMA order: (0, 1, 0) x Seasonal order: (0, 0, 0, 12) SARIMA order: (0, 1, 1) x Seasonal order: (0, 0, 1, 12) SARIMA order: (0, 1, 2) x Seasonal order: (0, 0, 2, 12) SARIMA order: (0, 1, 3) x Seasonal order: (0, 0, 3, 12) SARIMA order: (1, 1, 0) x Seasonal order: (1, 0, 0, 12) SARIMA order: (1, 1, 1) x Seasonal order: (1, 0, 1, 12) SARIMA order: (1, 1, 2) x Seasonal order: (1, 0, 2, 12) SARIMA order: (1, 1, 3) x Seasonal order: (1, 0, 3, 12) SARIMA order: (2, 1, 0) x Seasonal order: (2, 0, 0, 12) SARIMA order: (2, 1, 1) x Seasonal order: (2, 0, 1, 12)
# ✅ Create an empty DataFrame to store SARIMA model performance
# Columns include:
# param → (p,d,q) non-seasonal parameters
# seasonal → (P,D,Q,s) seasonal parameters (s = seasonal period)
# AIC → Akaike Information Criterion (lower = better model fit)
SARIMA_AIC = pd.DataFrame(columns=['param', 'seasonal', 'AIC'])
# Display the empty tracking table
SARIMA_AIC
| param | seasonal | AIC |
|---|
import statsmodels.api as sm
import warnings
warnings.filterwarnings("ignore") # Suppress convergence warnings
# ✅ Loop through all (p,d,q) and seasonal (P,D,Q,s) combinations
for param in pdq:
for param_seasonal in PDQ:
try:
# Build SARIMA model
SARIMA_model = sm.tsa.statespace.SARIMAX(
df_train['Sparkling'],
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False
)
# Fit model (maxiter increases chance of convergence)
results_SARIMA = SARIMA_model.fit(maxiter=1000)
# Print progress
print(f"SARIMA{param} x {param_seasonal} → AIC = {results_SARIMA.aic:.2f}")
# ✅ Save results in the AIC comparison table
SARIMA_AIC.loc[len(SARIMA_AIC)] = [param, param_seasonal, results_SARIMA.aic]
except Exception as e:
# Prevent loop crash if a model fails to converge
print(f"SARIMA{param} x {param_seasonal} → ❌ Failed ({str(e)[:50]})")
continue
SARIMA(0, 1, 0) x (0, 0, 0, 12) → AIC = 2460.44 SARIMA(0, 1, 0) x (0, 0, 1, 12) → AIC = 2153.45 SARIMA(0, 1, 0) x (0, 0, 2, 12) → AIC = 1913.33 SARIMA(0, 1, 0) x (0, 0, 3, 12) → AIC = 3893.93 SARIMA(0, 1, 0) x (1, 0, 0, 12) → AIC = 2019.72 SARIMA(0, 1, 0) x (1, 0, 1, 12) → AIC = 1989.78 SARIMA(0, 1, 0) x (1, 0, 2, 12) → AIC = 1814.41 SARIMA(0, 1, 0) x (1, 0, 3, 12) → AIC = 3400.80 SARIMA(0, 1, 0) x (2, 0, 0, 12) → AIC = 1830.01 SARIMA(0, 1, 0) x (2, 0, 1, 12) → AIC = 1830.00 SARIMA(0, 1, 0) x (2, 0, 2, 12) → AIC = 1815.80 SARIMA(0, 1, 0) x (2, 0, 3, 12) → AIC = 3929.08 SARIMA(0, 1, 0) x (3, 0, 0, 12) → AIC = 1651.61 SARIMA(0, 1, 0) x (3, 0, 1, 12) → AIC = 1653.48 SARIMA(0, 1, 0) x (3, 0, 2, 12) → AIC = 1654.87 SARIMA(0, 1, 0) x (3, 0, 3, 12) → AIC = 3403.64 SARIMA(0, 1, 1) x (0, 0, 0, 12) → AIC = 2437.95 SARIMA(0, 1, 1) x (0, 0, 1, 12) → AIC = 2122.82 SARIMA(0, 1, 1) x (0, 0, 2, 12) → AIC = 1879.93 SARIMA(0, 1, 1) x (0, 0, 3, 12) → AIC = 4581.26 SARIMA(0, 1, 1) x (1, 0, 0, 12) → AIC = 1972.08 SARIMA(0, 1, 1) x (1, 0, 1, 12) → AIC = 1917.64 SARIMA(0, 1, 1) x (1, 0, 2, 12) → AIC = 1749.38 SARIMA(0, 1, 1) x (1, 0, 3, 12) → AIC = 3432.61 SARIMA(0, 1, 1) x (2, 0, 0, 12) → AIC = 1782.29 SARIMA(0, 1, 1) x (2, 0, 1, 12) → AIC = 1778.61 SARIMA(0, 1, 1) x (2, 0, 2, 12) → AIC = 1749.81 SARIMA(0, 1, 1) x (2, 0, 3, 12) → AIC = 3399.29 SARIMA(0, 1, 1) x (3, 0, 0, 12) → AIC = 1606.80 SARIMA(0, 1, 1) x (3, 0, 1, 12) → AIC = 1608.41 SARIMA(0, 1, 1) x (3, 0, 2, 12) → AIC = 1609.48 SARIMA(0, 1, 1) x (3, 0, 3, 12) → AIC = 3417.83 SARIMA(0, 1, 2) x (0, 0, 0, 12) → AIC = 2392.74 SARIMA(0, 1, 2) x (0, 0, 1, 12) → AIC = 2086.23 SARIMA(0, 1, 2) x (0, 0, 2, 12) → AIC = 1847.17 SARIMA(0, 1, 2) x (0, 0, 3, 12) → AIC = 2888.92 SARIMA(0, 1, 2) x (1, 0, 0, 12) → AIC = 1964.42 SARIMA(0, 1, 2) x (1, 0, 1, 12) → AIC = 1901.33 SARIMA(0, 1, 2) x (1, 0, 2, 12) → AIC = 1733.98 SARIMA(0, 1, 2) x (1, 0, 3, 12) → AIC = 4110.47 SARIMA(0, 1, 2) x (2, 0, 0, 12) → AIC = 1777.82 SARIMA(0, 1, 2) x (2, 0, 1, 12) → AIC = 1777.17 SARIMA(0, 1, 2) x (2, 0, 2, 12) → AIC = 1734.45 SARIMA(0, 1, 2) x (2, 0, 3, 12) → AIC = 4056.90 SARIMA(0, 1, 2) x (3, 0, 0, 12) → AIC = 1604.49 SARIMA(0, 1, 2) x (3, 0, 1, 12) → AIC = 1606.35 SARIMA(0, 1, 2) x (3, 0, 2, 12) → AIC = 1607.99 SARIMA(0, 1, 2) x (3, 0, 3, 12) → AIC = 4057.03 SARIMA(0, 1, 3) x (0, 0, 0, 12) → AIC = 2374.68 SARIMA(0, 1, 3) x (0, 0, 1, 12) → AIC = 2072.04 SARIMA(0, 1, 3) x (0, 0, 2, 12) → AIC = 1831.95 SARIMA(0, 1, 3) x (0, 0, 3, 12) → AIC = 4082.66 SARIMA(0, 1, 3) x (1, 0, 0, 12) → AIC = 1966.34 SARIMA(0, 1, 3) x (1, 0, 1, 12) → AIC = 1887.32 SARIMA(0, 1, 3) x (1, 0, 2, 12) → AIC = 1719.12 SARIMA(0, 1, 3) x (1, 0, 3, 12) → AIC = 3522.85 SARIMA(0, 1, 3) x (2, 0, 0, 12) → AIC = 1778.50 SARIMA(0, 1, 3) x (2, 0, 1, 12) → AIC = 1777.58 SARIMA(0, 1, 3) x (2, 0, 2, 12) → AIC = 1719.95 SARIMA(0, 1, 3) x (2, 0, 3, 12) → AIC = 3777.39 SARIMA(0, 1, 3) x (3, 0, 0, 12) → AIC = 1604.84 SARIMA(0, 1, 3) x (3, 0, 1, 12) → AIC = 1615.14 SARIMA(0, 1, 3) x (3, 0, 2, 12) → AIC = 1607.90 SARIMA(0, 1, 3) x (3, 0, 3, 12) → AIC = 3858.20 SARIMA(1, 1, 0) x (0, 0, 0, 12) → AIC = 2458.62 SARIMA(1, 1, 0) x (0, 0, 1, 12) → AIC = 2151.91 SARIMA(1, 1, 0) x (0, 0, 2, 12) → AIC = 1910.75 SARIMA(1, 1, 0) x (0, 0, 3, 12) → AIC = 3445.34 SARIMA(1, 1, 0) x (1, 0, 0, 12) → AIC = 1992.39 SARIMA(1, 1, 0) x (1, 0, 1, 12) → AIC = 1972.80 SARIMA(1, 1, 0) x (1, 0, 2, 12) → AIC = 1800.78 SARIMA(1, 1, 0) x (1, 0, 3, 12) → AIC = 4164.39 SARIMA(1, 1, 0) x (2, 0, 0, 12) → AIC = 1803.45 SARIMA(1, 1, 0) x (2, 0, 1, 12) → AIC = 1801.44 SARIMA(1, 1, 0) x (2, 0, 2, 12) → AIC = 1801.13 SARIMA(1, 1, 0) x (2, 0, 3, 12) → AIC = 4129.92 SARIMA(1, 1, 0) x (3, 0, 0, 12) → AIC = 1625.52 SARIMA(1, 1, 0) x (3, 0, 1, 12) → AIC = 1627.42 SARIMA(1, 1, 0) x (3, 0, 2, 12) → AIC = 1628.97 SARIMA(1, 1, 0) x (3, 0, 3, 12) → AIC = 4148.10 SARIMA(1, 1, 1) x (0, 0, 0, 12) → AIC = 2410.09 SARIMA(1, 1, 1) x (0, 0, 1, 12) → AIC = 2104.01 SARIMA(1, 1, 1) x (0, 0, 2, 12) → AIC = 1864.14 SARIMA(1, 1, 1) x (0, 0, 3, 12) → AIC = 12.00 SARIMA(1, 1, 1) x (1, 0, 0, 12) → AIC = 1949.81 SARIMA(1, 1, 1) x (1, 0, 1, 12) → AIC = 1917.69 SARIMA(1, 1, 1) x (1, 0, 2, 12) → AIC = 1748.91 SARIMA(1, 1, 1) x (1, 0, 3, 12) → AIC = 3677.74 SARIMA(1, 1, 1) x (2, 0, 0, 12) → AIC = 1765.44 SARIMA(1, 1, 1) x (2, 0, 1, 12) → AIC = 1821.52 SARIMA(1, 1, 1) x (2, 0, 2, 12) → AIC = 1749.65 SARIMA(1, 1, 1) x (2, 0, 3, 12) → AIC = 3568.12 SARIMA(1, 1, 1) x (3, 0, 0, 12) → AIC = 1591.61 SARIMA(1, 1, 1) x (3, 0, 1, 12) → AIC = 1593.44 SARIMA(1, 1, 1) x (3, 0, 2, 12) → AIC = 1595.09 SARIMA(1, 1, 1) x (3, 0, 3, 12) → AIC = 3571.64 SARIMA(1, 1, 2) x (0, 0, 0, 12) → AIC = 2389.71 SARIMA(1, 1, 2) x (0, 0, 1, 12) → AIC = 2087.93 SARIMA(1, 1, 2) x (0, 0, 2, 12) → AIC = 1846.45 SARIMA(1, 1, 2) x (0, 0, 3, 12) → AIC = 3791.72 SARIMA(1, 1, 2) x (1, 0, 0, 12) → AIC = 1945.43 SARIMA(1, 1, 2) x (1, 0, 1, 12) → AIC = 1900.84 SARIMA(1, 1, 2) x (1, 0, 2, 12) → AIC = 1732.03 SARIMA(1, 1, 2) x (1, 0, 3, 12) → AIC = 3727.07 SARIMA(1, 1, 2) x (2, 0, 0, 12) → AIC = 1762.33 SARIMA(1, 1, 2) x (2, 0, 1, 12) → AIC = 1761.83 SARIMA(1, 1, 2) x (2, 0, 2, 12) → AIC = 1732.95 SARIMA(1, 1, 2) x (2, 0, 3, 12) → AIC = 3493.14 SARIMA(1, 1, 2) x (3, 0, 0, 12) → AIC = 1589.31 SARIMA(1, 1, 2) x (3, 0, 1, 12) → AIC = 1591.10 SARIMA(1, 1, 2) x (3, 0, 2, 12) → AIC = 1592.94 SARIMA(1, 1, 2) x (3, 0, 3, 12) → AIC = 3427.93 SARIMA(1, 1, 3) x (0, 0, 0, 12) → AIC = 2374.98 SARIMA(1, 1, 3) x (0, 0, 1, 12) → AIC = 2069.10 SARIMA(1, 1, 3) x (0, 0, 2, 12) → AIC = 1827.84 SARIMA(1, 1, 3) x (0, 0, 3, 12) → AIC = 343.25 SARIMA(1, 1, 3) x (1, 0, 0, 12) → AIC = 1947.46 SARIMA(1, 1, 3) x (1, 0, 1, 12) → AIC = 1888.86 SARIMA(1, 1, 3) x (1, 0, 2, 12) → AIC = 1719.19 SARIMA(1, 1, 3) x (1, 0, 3, 12) → AIC = 4050.89 SARIMA(1, 1, 3) x (2, 0, 0, 12) → AIC = 1764.26 SARIMA(1, 1, 3) x (2, 0, 1, 12) → AIC = 1763.77 SARIMA(1, 1, 3) x (2, 0, 2, 12) → AIC = 1720.32 SARIMA(1, 1, 3) x (2, 0, 3, 12) → AIC = 3740.28 SARIMA(1, 1, 3) x (3, 0, 0, 12) → AIC = 1591.31 SARIMA(1, 1, 3) x (3, 0, 1, 12) → AIC = 1595.51 SARIMA(1, 1, 3) x (3, 0, 2, 12) → AIC = 1594.98 SARIMA(1, 1, 3) x (3, 0, 3, 12) → AIC = 3359.37 SARIMA(2, 1, 0) x (0, 0, 0, 12) → AIC = 2435.65 SARIMA(2, 1, 0) x (0, 0, 1, 12) → AIC = 2144.87 SARIMA(2, 1, 0) x (0, 0, 2, 12) → AIC = 1901.48 SARIMA(2, 1, 0) x (0, 0, 3, 12) → AIC = 4199.17 SARIMA(2, 1, 0) x (1, 0, 0, 12) → AIC = 1961.62 SARIMA(2, 1, 0) x (1, 0, 1, 12) → AIC = 1941.61 SARIMA(2, 1, 0) x (1, 0, 2, 12) → AIC = 1784.97 SARIMA(2, 1, 0) x (1, 0, 3, 12) → AIC = 3612.58 SARIMA(2, 1, 0) x (2, 0, 0, 12) → AIC = 1774.11 SARIMA(2, 1, 0) x (2, 0, 1, 12) → AIC = 1772.34 SARIMA(2, 1, 0) x (2, 0, 2, 12) → AIC = 1770.76 SARIMA(2, 1, 0) x (2, 0, 3, 12) → AIC = 3068.95 SARIMA(2, 1, 0) x (3, 0, 0, 12) → AIC = 1596.95 SARIMA(2, 1, 0) x (3, 0, 1, 12) → AIC = 1598.62 SARIMA(2, 1, 0) x (3, 0, 2, 12) → AIC = 1599.19 SARIMA(2, 1, 0) x (3, 0, 3, 12) → AIC = 3548.30 SARIMA(2, 1, 1) x (0, 0, 0, 12) → AIC = 2408.43 SARIMA(2, 1, 1) x (0, 0, 1, 12) → AIC = 2103.53 SARIMA(2, 1, 1) x (0, 0, 2, 12) → AIC = 1862.25 SARIMA(2, 1, 1) x (0, 0, 3, 12) → AIC = 2743.57 SARIMA(2, 1, 1) x (1, 0, 0, 12) → AIC = 1973.64 SARIMA(2, 1, 1) x (1, 0, 1, 12) → AIC = 1917.21 SARIMA(2, 1, 1) x (1, 0, 2, 12) → AIC = 1748.75 SARIMA(2, 1, 1) x (1, 0, 3, 12) → AIC = 3068.07 SARIMA(2, 1, 1) x (2, 0, 0, 12) → AIC = 1751.70 SARIMA(2, 1, 1) x (2, 0, 1, 12) → AIC = 1750.03 SARIMA(2, 1, 1) x (2, 0, 2, 12) → AIC = 1749.38 SARIMA(2, 1, 1) x (2, 0, 3, 12) → AIC = 3101.90 SARIMA(2, 1, 1) x (3, 0, 0, 12) → AIC = 1578.22 SARIMA(2, 1, 1) x (3, 0, 1, 12) → AIC = 1580.21 SARIMA(2, 1, 1) x (3, 0, 2, 12) → AIC = 1601.18 SARIMA(2, 1, 1) x (3, 0, 3, 12) → AIC = 3964.28 SARIMA(2, 1, 2) x (0, 0, 0, 12) → AIC = 2370.98 SARIMA(2, 1, 2) x (0, 0, 1, 12) → AIC = 2088.18 SARIMA(2, 1, 2) x (0, 0, 2, 12) → AIC = 1848.44 SARIMA(2, 1, 2) x (0, 0, 3, 12) → AIC = 3697.94 SARIMA(2, 1, 2) x (1, 0, 0, 12) → AIC = 1931.14 SARIMA(2, 1, 2) x (1, 0, 1, 12) → AIC = 1902.82 SARIMA(2, 1, 2) x (1, 0, 2, 12) → AIC = 1734.03 SARIMA(2, 1, 2) x (1, 0, 3, 12) → AIC = 3901.77 SARIMA(2, 1, 2) x (2, 0, 0, 12) → AIC = 1750.24 SARIMA(2, 1, 2) x (2, 0, 1, 12) → AIC = 1750.02 SARIMA(2, 1, 2) x (2, 0, 2, 12) → AIC = 1737.61 SARIMA(2, 1, 2) x (2, 0, 3, 12) → AIC = 3828.52 SARIMA(2, 1, 2) x (3, 0, 0, 12) → AIC = 1577.10 SARIMA(2, 1, 2) x (3, 0, 1, 12) → AIC = 1578.93 SARIMA(2, 1, 2) x (3, 0, 2, 12) → AIC = 1580.70 SARIMA(2, 1, 2) x (3, 0, 3, 12) → AIC = 3561.93 SARIMA(2, 1, 3) x (0, 0, 0, 12) → AIC = 2377.96 SARIMA(2, 1, 3) x (0, 0, 1, 12) → AIC = 2060.15 SARIMA(2, 1, 3) x (0, 0, 2, 12) → AIC = 1823.64 SARIMA(2, 1, 3) x (0, 0, 3, 12) → AIC = 3298.59 SARIMA(2, 1, 3) x (1, 0, 0, 12) → AIC = 1930.93 SARIMA(2, 1, 3) x (1, 0, 1, 12) → AIC = 1890.95 SARIMA(2, 1, 3) x (1, 0, 2, 12) → AIC = 1717.16 SARIMA(2, 1, 3) x (1, 0, 3, 12) → AIC = 4154.31 SARIMA(2, 1, 3) x (2, 0, 0, 12) → AIC = 1751.95 SARIMA(2, 1, 3) x (2, 0, 1, 12) → AIC = 1747.48 SARIMA(2, 1, 3) x (2, 0, 2, 12) → AIC = 1733.98 SARIMA(2, 1, 3) x (2, 0, 3, 12) → AIC = 3600.29 SARIMA(2, 1, 3) x (3, 0, 0, 12) → AIC = 1578.85 SARIMA(2, 1, 3) x (3, 0, 1, 12) → AIC = 1580.68 SARIMA(2, 1, 3) x (3, 0, 2, 12) → AIC = 1582.19 SARIMA(2, 1, 3) x (3, 0, 3, 12) → AIC = 3004.08 SARIMA(3, 1, 0) x (0, 0, 0, 12) → AIC = 2417.01 SARIMA(3, 1, 0) x (0, 0, 1, 12) → AIC = 2144.79 SARIMA(3, 1, 0) x (0, 0, 2, 12) → AIC = 1899.15 SARIMA(3, 1, 0) x (0, 0, 3, 12) → AIC = 4586.68 SARIMA(3, 1, 0) x (1, 0, 0, 12) → AIC = 1943.05 SARIMA(3, 1, 0) x (1, 0, 1, 12) → AIC = 1924.15 SARIMA(3, 1, 0) x (1, 0, 2, 12) → AIC = 1783.33 SARIMA(3, 1, 0) x (1, 0, 3, 12) → AIC = 4018.13 SARIMA(3, 1, 0) x (2, 0, 0, 12) → AIC = 1759.51 SARIMA(3, 1, 0) x (2, 0, 1, 12) → AIC = 1756.89 SARIMA(3, 1, 0) x (2, 0, 2, 12) → AIC = 1755.54 SARIMA(3, 1, 0) x (2, 0, 3, 12) → AIC = 3862.65 SARIMA(3, 1, 0) x (3, 0, 0, 12) → AIC = 1580.85 SARIMA(3, 1, 0) x (3, 0, 1, 12) → AIC = 1582.24 SARIMA(3, 1, 0) x (3, 0, 2, 12) → AIC = 1582.43 SARIMA(3, 1, 0) x (3, 0, 3, 12) → AIC = 3520.15 SARIMA(3, 1, 1) x (0, 0, 0, 12) → AIC = 2390.31 SARIMA(3, 1, 1) x (0, 0, 1, 12) → AIC = 2105.43 SARIMA(3, 1, 1) x (0, 0, 2, 12) → AIC = 1864.20 SARIMA(3, 1, 1) x (0, 0, 3, 12) → AIC = 3711.77 SARIMA(3, 1, 1) x (1, 0, 0, 12) → AIC = 1920.57 SARIMA(3, 1, 1) x (1, 0, 1, 12) → AIC = 1904.43 SARIMA(3, 1, 1) x (1, 0, 2, 12) → AIC = 1750.58 SARIMA(3, 1, 1) x (1, 0, 3, 12) → AIC = 3557.50 SARIMA(3, 1, 1) x (2, 0, 0, 12) → AIC = 1739.44 SARIMA(3, 1, 1) x (2, 0, 1, 12) → AIC = 1737.97 SARIMA(3, 1, 1) x (2, 0, 2, 12) → AIC = 1761.06 SARIMA(3, 1, 1) x (2, 0, 3, 12) → AIC = 3103.90 SARIMA(3, 1, 1) x (3, 0, 0, 12) → AIC = 1563.91 SARIMA(3, 1, 1) x (3, 0, 1, 12) → AIC = 1565.56 SARIMA(3, 1, 1) x (3, 0, 2, 12) → AIC = 1566.58 SARIMA(3, 1, 1) x (3, 0, 3, 12) → AIC = 3762.19 SARIMA(3, 1, 2) x (0, 0, 0, 12) → AIC = 2392.19 SARIMA(3, 1, 2) x (0, 0, 1, 12) → AIC = 2085.22 SARIMA(3, 1, 2) x (0, 0, 2, 12) → AIC = 1845.68 SARIMA(3, 1, 2) x (0, 0, 3, 12) → AIC = 2365.36 SARIMA(3, 1, 2) x (1, 0, 0, 12) → AIC = 1960.39 SARIMA(3, 1, 2) x (1, 0, 1, 12) → AIC = 1904.81 SARIMA(3, 1, 2) x (1, 0, 2, 12) → AIC = 1736.00 SARIMA(3, 1, 2) x (1, 0, 3, 12) → AIC = 2391.01 SARIMA(3, 1, 2) x (2, 0, 0, 12) → AIC = 1738.32 SARIMA(3, 1, 2) x (2, 0, 1, 12) → AIC = 1737.25 SARIMA(3, 1, 2) x (2, 0, 2, 12) → AIC = 1736.93 SARIMA(3, 1, 2) x (2, 0, 3, 12) → AIC = 2306.62 SARIMA(3, 1, 2) x (3, 0, 0, 12) → AIC = 1563.14 SARIMA(3, 1, 2) x (3, 0, 1, 12) → AIC = 1567.57 SARIMA(3, 1, 2) x (3, 0, 2, 12) → AIC = 1566.79 SARIMA(3, 1, 2) x (3, 0, 3, 12) → AIC = 2312.64 SARIMA(3, 1, 3) x (0, 0, 0, 12) → AIC = 2354.17 SARIMA(3, 1, 3) x (0, 0, 1, 12) → AIC = 2074.24 SARIMA(3, 1, 3) x (0, 0, 2, 12) → AIC = 1835.03 SARIMA(3, 1, 3) x (0, 0, 3, 12) → AIC = 8068.02 SARIMA(3, 1, 3) x (1, 0, 0, 12) → AIC = 1914.97 SARIMA(3, 1, 3) x (1, 0, 1, 12) → AIC = 1892.77 SARIMA(3, 1, 3) x (1, 0, 2, 12) → AIC = 1722.21 SARIMA(3, 1, 3) x (1, 0, 3, 12) → AIC = 1764.21 SARIMA(3, 1, 3) x (2, 0, 0, 12) → AIC = 1739.98 SARIMA(3, 1, 3) x (2, 0, 1, 12) → AIC = 1735.67 SARIMA(3, 1, 3) x (2, 0, 2, 12) → AIC = 1729.00 SARIMA(3, 1, 3) x (2, 0, 3, 12) → AIC = 1849.69 SARIMA(3, 1, 3) x (3, 0, 0, 12) → AIC = 1565.11 SARIMA(3, 1, 3) x (3, 0, 1, 12) → AIC = 1567.01 SARIMA(3, 1, 3) x (3, 0, 2, 12) → AIC = 1568.79 SARIMA(3, 1, 3) x (3, 0, 3, 12) → AIC = 109.39
# ✅ Sort SARIMA results by AIC in ascending order (lowest AIC = best model fit)
SARIMA_AIC_sorted = SARIMA_AIC.sort_values(by='AIC', ascending=True).reset_index(drop=True)
print("📊 Top SARIMA(p,d,q) × (P,D,Q,12) models ranked by lowest AIC:")
display(SARIMA_AIC_sorted.head().style.format({'AIC': '{:,.2f}'}))
📊 Top SARIMA(p,d,q) × (P,D,Q,12) models ranked by lowest AIC:
| param | seasonal | AIC | |
|---|---|---|---|
| 0 | (1, 1, 1) | (0, 0, 3, 12) | 12.00 |
| 1 | (3, 1, 3) | (3, 0, 3, 12) | 109.39 |
| 2 | (1, 1, 3) | (0, 0, 3, 12) | 343.25 |
| 3 | (3, 1, 2) | (3, 0, 0, 12) | 1,563.14 |
| 4 | (3, 1, 1) | (3, 0, 0, 12) | 1,563.91 |
import statsmodels.api as sm
# ✅ Train the Best SARIMA Model based on AIC ranking
# order = (3,1,2) → Non-seasonal ARIMA(p,d,q)
# seasonal_order = (1,0,3,12) → Seasonal (P,D,Q,s) with s=12 (monthly seasonality)
auto_SARIMA = sm.tsa.statespace.SARIMAX(
df_train['Sparkling'],
order=(3,1,2),
seasonal_order=(1,0,3,12),
enforce_stationarity=False, # Allow model flexibility even if not stationary
enforce_invertibility=False # Prevent convergence errors
)
# ✅ Fit the model; maxiter increased to allow proper convergence on complex seasonal models
results_auto_SARIMA = auto_SARIMA.fit(maxiter=1000)
# ✅ Display detailed model summary including AIC, BIC, coefficients, and diagnostics
print("📌 SARIMA(3,1,2) × (1,0,3,12) Model Summary:\n")
print(results_auto_SARIMA.summary())
📌 SARIMA(3,1,2) × (1,0,3,12) Model Summary:
SARIMAX Results
==================================================================================================
Dep. Variable: Sparkling No. Observations: 144
Model: SARIMAX(3, 1, 2)x(1, 0, [1, 2, 3], 12) Log Likelihood -1185.506
Date: Sun, 02 Nov 2025 AIC 2391.011
Time: 14:00:13 BIC 2417.455
Sample: 01-31-1980 HQIC 2401.725
- 12-31-1991
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 -1.0985 77.217 -0.014 0.989 -152.440 150.243
ar.L2 0.2057 2171.193 9.47e-05 1.000 -4255.254 4255.666
ar.L3 -0.2747 48.280 -0.006 0.995 -94.902 94.353
ma.L1 0.8415 0.005 162.681 0.000 0.831 0.852
ma.L2 -0.7274 10.834 -0.067 0.946 -21.962 20.507
ar.S.L12 0.9089 51.078 0.018 0.986 -99.201 101.019
ma.S.L12 -1.26e+13 3.1e-06 -4.06e+18 0.000 -1.26e+13 -1.26e+13
ma.S.L24 1.397e+13 2.8e-11 4.99e+23 0.000 1.4e+13 1.4e+13
ma.S.L36 2.074e+12 2.06e-12 1.01e+24 0.000 2.07e+12 2.07e+12
sigma2 1.779e+06 2.570 6.92e+05 0.000 1.78e+06 1.78e+06
===================================================================================
Ljung-Box (L1) (Q): 9.88 Jarque-Bera (JB): 29215.95
Prob(Q): 0.00 Prob(JB): 0.00
Heteroskedasticity (H): 0.00 Skew: 8.44
Prob(H) (two-sided): 0.00 Kurtosis: 83.36
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 2.24e+46. Standard errors may be unstable.
# ✅ Residual diagnostics for SARIMA model
# This checks whether the model errors behave like white noise
results_auto_SARIMA.plot_diagnostics(figsize=(14, 10))
plt.suptitle("SARIMA Model Residual Diagnostics", fontsize=14, y=1.02)
plt.show()
# ✅ Forecasting the same number of future periods as the Test dataset
# get_forecast() produces point forecasts + prediction intervals
predicted_auto_SARIMA = results_auto_SARIMA.get_forecast(steps=len(df_test))
# ✅ Convert SARIMA forecast results into a summary DataFrame
# summary_frame(alpha=0.05) returns:
# - mean → Point forecast
# - mean_se → Standard error of the mean forecast
# - mean_ci_lower → Lower bound of 95% confidence interval
# - mean_ci_upper → Upper bound of 95% confidence interval
forecast_df = predicted_auto_SARIMA.summary_frame(alpha=0.05)
# Show the first few forecasted values
forecast_df.head()
| Sparkling | mean | mean_se | mean_ci_lower | mean_ci_upper |
|---|---|---|---|---|
| 1992-01-31 | -5.604998e+22 | NaN | NaN | NaN |
| 1992-02-29 | 7.785341e+22 | NaN | NaN | NaN |
| 1992-03-31 | -1.081384e+23 | NaN | NaN | NaN |
| 1992-04-30 | 1.502042e+23 | NaN | NaN | NaN |
| 1992-05-31 | -2.086336e+23 | NaN | NaN | NaN |
# ✅ RMSE = sqrt(MSE)
rmse = np.sqrt(mean_squared_error(df_test['Sparkling'], predicted_auto_SARIMA.predicted_mean))
# ✅ MAPE using our custom function
mape = mean_absolute_percentage_error(df_test['Sparkling'], predicted_auto_SARIMA.predicted_mean)
# ✅ Nicely formatted output
print(f"📌 SARIMA Model Performance on Test Data:")
print(f" ➤ RMSE : {rmse:,.2f}")
print(f" ➤ MAPE : {mape:.2f}%")
📌 SARIMA Model Performance on Test Data: ➤ RMSE : 12,132,559,640,424,642,843,221,426,176.00 ➤ MAPE : 251533539866592360997584896.00%
# ✅ Create a temporary dataframe to store SARIMA model performance
# - RMSE: Root Mean Squared Error (accuracy in unit scale)
# - MAPE: Mean Absolute Percentage Error (accuracy in %)
# Index is named after the specific SARIMA model tested
temp_resultsDf = pd.DataFrame(
{'RMSE': rmse, 'MAPE': mape},
index=['SARIMA(3,1,2)(1,0,3,12)'] # ✅ Updated to match model order: (3,1,2)
)
# ✅ Append SARIMA results to the existing model performance table (resultsDf)
# Ensures all models can be compared in one consolidated table
resultsDf = pd.concat([resultsDf, temp_resultsDf])
# ✅ Display updated table including ARIMA and SARIMA performance
resultsDf
| RMSE | MAPE | |
|---|---|---|
| ARIMA(2,1,2) | 1.309631e+03 | 4.210620e+01 |
| SARIMA(3,1,2)(1,0,3,12) | 1.213256e+28 | 2.515335e+26 |
resultsDf2 = resultsDf.sort_values(by='RMSE', ascending=True)
print("📊 Model Performance Ranking (Lowest RMSE = Best):")
display(resultsDf2.style.format({'RMSE': '{:,.2f}', 'MAPE': '{:.2f}%'}))
📊 Model Performance Ranking (Lowest RMSE = Best):
| RMSE | MAPE | |
|---|---|---|
| ARIMA(2,1,2) | 1,309.63 | 42.11% |
| SARIMA(3,1,2)(1,0,3,12) | 12,132,559,640,424,642,843,221,426,176.00 | 251533539866592360997584896.00% |
# ✅ Extract only the RMSE column from the sorted results table
# resultsDf2 already contains performance results for all models
resultsDf3 = resultsDf2[['RMSE']].copy() # copy() helps avoid SettingWithCopy warnings
# ✅ Rename the column for clarity in report tables
# 'Test RMSE' clearly indicates performance measured on the test dataset
resultsDf3.rename(columns={'RMSE': 'Test RMSE'}, inplace=True)
# ✅ Display the final performance comparison table (RMSE only)
resultsDf3
| Test RMSE | |
|---|---|
| ARIMA(2,1,2) | 1.309631e+03 |
| SARIMA(3,1,2)(1,0,3,12) | 1.213256e+28 |
# ✅ Keep only the RMSE column for comparison table
resultsDf3 = resultsDf2[['RMSE']].copy() # copy() prevents SettingWithCopyWarning
# ✅ Rename the column for clear understanding in final report
resultsDf3.rename(columns={'RMSE': 'Test RMSE'}, inplace=True)
# ✅ Display final table of model performance (RMSE only)
display(resultsDf3.style.format({'Test RMSE': '{:,.2f}'}))
| Test RMSE | |
|---|---|
| ARIMA(2,1,2) | 1,309.63 |
| SARIMA(3,1,2)(1,0,3,12) | 12,132,559,640,424,642,843,221,426,176.00 |
Observations:
Coefficient instability warnings
Large standard errors with weak parameter significance
Residuals exhibit abnormal spikes
QQ plot strongly deviates and not normally distributed
Histogram is unrealistic
Suggests model overfitting / poor convergence
Seasonal structure not modeled correctly, or wrong parameter search
The SARIMA model tested had unstable residuals with extremely high error values, which means the seasonal parameters used did not suit the data and the model is not appropriate in its current configuration.
Model Comparison and Final Model Selection¶
Rebuild best model on entire data and make next 12 month forecast
comparison_model_table = pd.concat([resultsDf1, resultsDf3])
comparison_model_table.sort_values(by = 'Test RMSE')
| Test RMSE | |
|---|---|
| TES (α=0.50, β=0.30, γ=0.30) | 4.126138e+02 |
| SimpleAverageModel | 1.268683e+03 |
| SES_Model | 1.293814e+03 |
| ARIMA(2,1,2) | 1.309631e+03 |
| SES (Alpha = 0.30) | 1.900059e+03 |
| DES_Model | 2.654534e+03 |
| DES (Alpha=0.30, Beta=0.30) | 1.381440e+04 |
| Regression On Time Model | 1.840430e+06 |
| NaiveForecast | 1.583972e+07 |
| SARIMA(3,1,2)(1,0,3,12) | 1.213256e+28 |
# ✅ Combine model performance results
comparison_model_table = pd.concat([resultsDf1, resultsDf3])
# ✅ Sort models by Test RMSE (Lowest RMSE = Best Performance)
comparison_model_table = comparison_model_table.sort_values(by='Test RMSE', ascending=True)
# ✅ Display formatted table
print("📊 Model Comparison and Final Model Selection (Ranked by Test RMSE)")
display(comparison_model_table.style.format({'Test RMSE': '{:,.2f}'}))
📊 Model Comparison and Final Model Selection (Ranked by Test RMSE)
| Test RMSE | |
|---|---|
| TES (α=0.50, β=0.30, γ=0.30) | 412.61 |
| SimpleAverageModel | 1,268.68 |
| SES_Model | 1,293.81 |
| ARIMA(2,1,2) | 1,309.63 |
| SES (Alpha = 0.30) | 1,900.06 |
| DES_Model | 2,654.53 |
| DES (Alpha=0.30, Beta=0.30) | 13,814.40 |
| Regression On Time Model | 1,840,430.14 |
| NaiveForecast | 15,839,720.95 |
| SARIMA(3,1,2)(1,0,3,12) | 12,132,559,640,424,642,843,221,426,176.00 |
Observations:
Among all the evaluated methods for forecasting, the Triple Exponential Smoothing model (Holt Winters) with alpha=0.50, beta=0.30, gamma=0.30 yielded the lowest RMSE (approx. 412.61), turning it into the best performing model to forecast Sparkling Wine sales.
The Naive and SARIMA models basically failed, while ARIMA and the other smoothing models gave moderate results but did not match the performance of the tuned Holt Winters model.
Forecasting using Final Model¶
model = ExponentialSmoothing(df['Sparkling'],trend='additive',seasonal='multiplicative',freq='M')
# fit model
model_fit = model.fit(smoothing_level=0.3,smoothing_trend=0.3,smoothing_seasonal=0.3,optimized=False,use_brute=True)
# make prediction
yhat = model_fit.predict(start='31-08-1995',end='31-08-1996')
plt.figure(figsize=(18,9)) # Large figure for presentation quality
# ✅ Plot the original Sparkling sales data
plt.plot(df['Sparkling'], label='Actual Sales Data', linewidth=2)
# ✅ Plot model predictions (future forecast)
plt.plot(yhat,
label='TES Forecast (α = β = γ = 0.3)',
linestyle='--',
linewidth=2)
# ✅ Visualization enhancements
plt.title("Triple Exponential Smoothing (Holt-Winters) Forecast", fontsize=16)
plt.xlabel("Time", fontsize=14)
plt.ylabel("Sparkling Wine Sales", fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(loc='best', fontsize=12)
plt.tight_layout()
plt.show()
- In the below code, we have calculated the upper and lower confidence bands at 95% confidence level
- Here we are taking the multiplier to be 1.96 as we want to plot with respect to a 95% confidence intervals.
# ✅ Create a forecast results DataFrame including confidence intervals
# Standard deviation of residuals → used for CI width
resid_std = np.std(model_fit.resid, ddof=1)
pred_1_df = pd.DataFrame({
'lower_CI': yhat - 1.96 * resid_std, # Lower 95% confidence interval
'prediction': yhat, # Forecasted values
'upper_CI': yhat + 1.96 * resid_std # Upper 95% confidence interval
})
# ✅ Display the first few rows of forecast with CI
pred_1_df.head()
| lower_CI | prediction | upper_CI | |
|---|---|---|---|
| 1995-08-31 | 1014.069331 | 1854.686308 | 2695.303286 |
| 1995-09-30 | 1650.369143 | 2490.986121 | 3331.603098 |
| 1995-10-31 | 2498.588142 | 3339.205119 | 4179.822096 |
| 1995-11-30 | 3415.513796 | 4256.130773 | 5096.747751 |
| 1995-12-31 | 6048.795192 | 6889.412170 | 7730.029147 |
pred_1_df.tail()
| lower_CI | prediction | upper_CI | |
|---|---|---|---|
| 1996-04-30 | 1573.333217 | 2413.950194 | 3254.567172 |
| 1996-05-31 | 1335.714857 | 2176.331834 | 3016.948811 |
| 1996-06-30 | 1212.044951 | 2052.661928 | 2893.278906 |
| 1996-07-31 | 1597.967939 | 2438.584916 | 3279.201894 |
| 1996-08-31 | 1522.187178 | 2362.804155 | 3203.421132 |
# ✅ Plot Forecast vs Actual with Confidence Intervals
# Create a large, clear figure for presentation
plt.figure(figsize=(18,9))
# ✅ Plot the historical actual sales data
plt.plot(df['Sparkling'],
label='Actual Data',
linewidth=2)
# ✅ Plot the forecasted values from Triple Exponential Smoothing (TES)
plt.plot(pred_1_df['prediction'],
label='TES Forecast (α = β = γ = 0.3)',
linestyle='--',
linewidth=2)
# ✅ Add 95% Confidence Interval shading around the forecast
# The shaded area represents uncertainty in future predictions
plt.fill_between(pred_1_df.index,
pred_1_df['lower_CI'],
pred_1_df['upper_CI'],
color='gray',
alpha=0.3,
label='95% Confidence Interval')
# ✅ Labels and formatting for readability and professional presentation
plt.title("Triple Exponential Smoothing (TES) Forecast with 95% Confidence Interval",
fontsize=16)
plt.xlabel("Time", fontsize=14)
plt.ylabel("Sparkling Sales", fontsize=14)
plt.legend(loc='best', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
# ✅ Display the final plot
plt.show()
Observations:
The Holt-Winters Triple Exponential Smoothing model with parameters alpha = 0.50, beta = 0.30, gamma = 0.30 provides the best forecast accuracy, effectively capturing the overall level, upward trend, and strong seasonal spikes of sparkling wine sales.
The 95% confidence interval remains stable, suggesting low forecast uncertainty and a well fitted model.
Business Insights and Recommendations¶
Business Insights¶
Strong, consistent seasonality, shoulder lifts usually begin in October or November, with softer demand in May or August, sales peak every December.
Long term upward trend, underlying demand is rising as annual totals have generally increased over the period.
Demand that is right skewed, most months are in the middle range, with a few exceptionally high months driving a significant portion of volume.
Performance of the model: Holt Winters (TES) has the lowest RMSE and best captures trend and seasonality. ARIMA performs mediocrely while naive, average, and linear perform worse.
Residual diagnostics: Behaviour is fit by multiplicative seasonality (peaks increase with overall level).
Recommendations¶
- Planning for Inventory and Supplies
Create stock in September, peak in November, and decrease in January through February by implementing seasonal inventory profiles.
Establish monthly procurement and master production schedule (MPS) plans using TES forecasts.
- Marketing and Sales
Beginning in early November, launch holiday marketing campaigns (gifting, party packages) and continue them through the new year.
Use shoulder season promotions (May to August) to streamline usage, these could include value packs, small discounts, and cross selling with still wines.
Create limited edition seasonal SKUs (like "Holiday Sparkling Edition") to capitalise on December's elasticity.
- Revenue and Price Management
Utilise calendar based pricing, tactically support prices off peak and hold prices during peak weeks.
Use mix/bundle techniques (accessory/cheese pairings + sparkle) to increase basket size.
- Staffing and Operations
Scale labour and logistics for November and December, hire temporary warehouse workers, extend the time for picking and packing, and reserve space for carriers.
Place inventory ahead of time in regional DCs to cut lead time and keep stock levels high during peak times.
- Store and Channel
Get secure feature placements and end caps for the fourth quarter, and in October, negotiate guaranteed shelf facings.
For Direct to Consumer/e-commerce, let people place pre orders and promise to ship by a certain date for holiday delivery.
- Financial Planning & Analysis / Sales & Operations Planning
Every month, make a 12 to 18 month forecast and line it up with the S&OP cadence (Demand Supply Finance).
Check the accuracy of your forecasts by month (MAPE, WAPE) and service level, and make your goals more strict for Q4.
- Improvements to analytics (next iterations)
Make models for different types of wine (sparkling, red, white, rose) to get the best portfolio.
Add outside factors like holidays, sales, prices, weather, and general mood.
Do scenario planning (Base, Stretch, and Downside) to figure out how much capacity and cash you need.
KPIs to Monitor¶
Forecast Accuracy (MAPE/WAPE) by SKU and month
Fill Rate / OTIF (on time, in full) in November and December
Days with the most stockouts and backorders
Gross margin by channel and promotion Return on Investment
Days of Supply and Inventory Turns (especially in Q4)
Actions to Take Right Away (Next 30 to 60 Days)
Use the TES model as the production forecast for sparkling wine and other types of wine.
Make a plan for demand for the next 12 months and set aside space for Q4.
By the end of the third quarter, make sure the holiday promo calendar and store placements are set.
Set up a weekly peak season control tower for supply, sales, and logistics from November to December.
from google.colab import files
uploaded_files = files.upload()
Saving Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb to Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb
if len(uploaded_files) == 1:
notebook_name = next(iter(uploaded_files))
print(f"Converting notebook {notebook_name} to HTML.")
!pip install nbconvert
!jupyter nbconvert "$notebook_name" --to html
html_name = notebook_name.rsplit('.', 1)[0] + '.html'
files.download(html_name)
else: # Corrected indentation for the else block
print("Please upload exactly one .ipynb file.")
Converting notebook Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb to HTML. Requirement already satisfied: nbconvert in /usr/local/lib/python3.12/dist-packages (7.16.6) Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (4.13.5) Requirement already satisfied: bleach!=5.0.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (6.3.0) Requirement already satisfied: defusedxml in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.7.1) Requirement already satisfied: jinja2>=3.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.6) Requirement already satisfied: jupyter-core>=4.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.9.1) Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.3.0) Requirement already satisfied: markupsafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.0.3) Requirement already satisfied: mistune<4,>=2.0.3 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.4) Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.10.2) Requirement already satisfied: nbformat>=5.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.10.4) Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from nbconvert) (25.0) Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (1.5.1) Requirement already satisfied: pygments>=2.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (2.19.2) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.7.1) Requirement already satisfied: webencodings in /usr/local/lib/python3.12/dist-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert) (0.5.1) Requirement already satisfied: tinycss2<1.5,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (1.4.0) Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.12/dist-packages (from jupyter-core>=4.7->nbconvert) (4.5.0) Requirement already satisfied: jupyter-client>=6.1.12 in /usr/local/lib/python3.12/dist-packages (from nbclient>=0.5.0->nbconvert) (7.4.9) Requirement already satisfied: fastjsonschema>=2.15 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (2.21.2) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (4.25.1) Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (2.8) Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (4.15.0) Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (25.4.0) Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (2025.9.1) Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.37.0) Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.28.0) Requirement already satisfied: entrypoints in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (0.4) Requirement already satisfied: nest-asyncio>=1.5.4 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.6.0) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (2.9.0.post0) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (26.2.1) Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (6.5.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.17.0) [NbConvertApp] Converting notebook Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb to html [NbConvertApp] WARNING | Alternative text is missing on 26 image(s). [NbConvertApp] Writing 4933838 bytes to Full_code_notebook_TSF_project_Joseph_Anthony_Tan.html